The pCri System: A Vector Collection for Recombinant Protein Expression and Purification

A major bottleneck in structural, biochemical and biophysical studies of proteins is the need for large amounts of pure homogenous material, which is generally obtained by recombinant overexpression. Here we introduce a vector collection, the pCri System, for cytoplasmic and periplasmic/extracellular expression of heterologous proteins that allows the simultaneous assessment of prokaryotic and eukaryotic host cells (Escherichia coli, Bacillus subtilis, and Pichia pastoris). By using a single polymerase chain reaction product, genes of interest can be directionally cloned in all vectors within four different rare restriction sites at the 5′end and multiple cloning sites at the 3′end. In this way, a number of different fusion tags but also signal peptides can be incorporated at the N- and C-terminus of proteins, facilitating their expression, solubility and subsequent detection and purification. Fusion tags can be efficiently removed by treatment with site-specific peptidases, such as tobacco etch virus proteinase, thrombin, or sentrin specific peptidase 1, which leave only a few extra residues at the N-terminus of the protein. The combination of different expression systems in concert with the cloning approach in vectors that can fuse various tags makes the pCri System a valuable tool for high throughput studies.


Introduction
Researchers performing biochemical, biophysical and biological studies on proteins commonly require large amounts of pure homogeneous material, which cannot usually be purified from natural sources. Alternatively, proteins are over-expressed heterologously in various systems incorporating host cells of bacterial, yeast, insect, or mammalian origin [1][2][3]. A critical step in protein production, after target selection, is to examine as many parameters as possible and to identify the most promising strategy for protein expression and purification with a minimum of resources and time.
Prior information on the protein of interest is crucial. An extensive search in databases such as NCBI (http://www.ncbi. nlm.nih.gov), UniProt (http://www.uniprot.org) and PDB (http:// www.pdb.org) for known homologous proteins may identify possible problems and appropriate solutions for subsequent experiments. In addition, it is advisable to test protein orthologs of different origin, including distantly related or unrelated species (bacteria, archaea, and eukaryotes). At this point, analysis of the primary and secondary structure of both the encoding mRNA and the translated polypeptide may anticipate downstream problems.
In parallel, cDNA characterisation is important in designing the cloning strategy and identifying potential problems at the transcriptional and translational levels. Although these processes are affected by a number of exo-and endo-nucleases, the stability of the resulting mRNA is critical in protein expression experiments [20]. mRNA can be protected by introducing sequences at the 59 untranslated regions (UTRs) and stem loop structures at the 39 UTRs [21]. The GC base content (.70%) may affect levels of expression and can be easily determined by sequence analysis software. Rare codons (GCUA 2.0) [22], especially consecutive ones, are frequently found in heterologous genes and may lead to translational errors due to ribosomal stalling [23,24]. Such codon bias can be remedied by replacing selected codons or, if necessary, by overall gene optimisation using appropriate software (OPTI-MIZER) [25]. Once the above requirements are fulfilled, the gene can be inserted into the vector by directional cloning using restriction enzymes that do not cut within the gene sequence (NEBcutter) [26]. Efficiency of translation termination can be increased by introducing strong stop codons (UAA, especially in context when followed by a U base, or consecutive ones) at the end of the translated gene [27]. Although present in many expression vectors, transcription terminators can be included downstream of the transcribed gene if instability is predicted [28]. Finally, sources of cDNA can be found in the Mammalian Gene Collection (http://mgc.nci.nih.gov/) and at the home page of Culture Collection of the World (http://www.ecotao.com/holism/agric/ hpcc.html).
No expression system is generic for all target proteins, so both bacterial and eukaryotic systems need to be explored. Escherichia coli provides the cheapest expression host, and it is the most widely used but its machinery is not as sophisticated as that of eukaryotic hosts, and it cannot always express well folded proteins of variable origin [15]. Other alternatives often need to be tested, including bacterial systems such as Bacillus subtilis [29] and more advanced eukaryotic systems such as the yeasts Pichia pastoris [1] and Saccharomyces cerevisiae [30], the baculovirus expression system in insect cells [3], mammalian cells [31], or cell-free systems using prokaryotic extracts [32], which have highly variable costefficiency ratios.
With E. coli alone, many variables can be tested in order to improve expression levels and achieve proper protein folding [2,33]. A number of specialised strains carrying mutations [34,35] or plasmids that co-express proteins favouring expression at the transcriptional or translational level (e.g. pRARE or pLysE/ pLysS) are available [24,36]. Coupled expression of exogenous chaperones can assist in proper folding and prevent protein aggregation [37,38]. Expression can also be influenced by other parameters, such as the culture method (e.g. batch fermentation, fed batch and dialysis fermentation) [39], cell growth media composition (lysogeny broth (LB), the enriched terrific broth (TB), two times yeast and tryptone broth (26YT), and auto-induction media) [40], and culture conditions like temperature (18-37uC), shaking, aeration and other physical variables. All these factors can affect production levels, secretion, protein folding, solubility and host proteolytic activity [41,42].
The many systems for introducing fusion tags currently available were originally developed to facilitate the detection and purification of recombinant proteins. Tags such as polyhistidine (His 6 -tag) and streptavidin-binding peptide (Strep-tag) allow purification by affinity chromatography and protein detection by Western blotting [43,44], and others such as C-terminally fused green fluorescent protein (GFP) are an indispensable tool for membrane protein biochemists [45]. Finally, several studies have shown that the introduction of tags at the N-or C-terminus of proteins can improve expression levels by providing an optimized environment for translation initiation and mRNA protection, protein solubility [46][47][48], and carrier-driven crystallisation experiments [49].
Here we present a collection of vectors with which various expression systems and fusion tags can be evaluated simply and effectively. We examine the applicability of this system and provide several test cases, which support its robustness and versatility. This vector collection, which has been extensively tested and modified, is freely available to the scientific community under Addgene (https://www.addgene.org).

Genetic manipulations and vector preparation
Three series of vectors were generated on the basis of vectors available from the European Molecular Biology Laboratory (pETMBP-1a, pETTRX-1a, and pETGST), Novagen (pET-26b, and pET-28a), MoBiTec (pHT-01, and pHT-43), Invitrogen (pPICZA and pPICZaA), and from the Glockshuber laboratory (pRBI-DsbC) [50]. The inserted sequences for pCri-11, 13, and 14 were amplified from pET-15b-SUMO1 [51], pMIS3.0E [52], and pKLSLt [53], respectively. All vectors were prepared for directional cloning in NcoI or NdeI restriction sites at the 59end and in XhoI at the 39end. The gene coding for GFP (UniProt code: B6UPG7; 729 bp), including a multiple cloning site (MCS; from pETMBP-1a; 52 bp), was introduced into all vectors. The insert was cloned between the NcoI or NdeI and XhoI restriction sites and was modified to contain an MscI or NheI restriction site immediately after the NcoI and NdeI sites, respectively. Standard cloning techniques were used throughout [54]. Polymerase chain reaction (PCR) primers and DNA modifying enzymes were purchased from Sigma-Aldrich and Thermo-Scientific, respectively. PCR was performed using Phusion high-fidelity DNA polymerase (Thermo-Scientific) according to the manufacturer's instructions and following a standard optimisation step of a thermal gradient in each reaction. For vector preparation, a number of insertions and mutations introduced or eliminated nucleotide sequences. We followed a PCR-based strategy described elsewhere [55], including a DpnI digestion step to remove parental DNA. Digestion with restriction enzymes was carried out according to standard protocols. When necessary, a second round of digestion was performed before the final DNA purification step. DNA was purified from PCR reactions, enzymatic reactions, agarose gel band extractions, and vector extractions using OMEGA-Biotek purification kits. Chemically competent E. coli DH5a, BL21 (DE3), and Origami 2 (DE3) cells (Novagen) were prepared and transformed following Hanahan method [56]. Competent cells of P. pastoris KM71H (Invitrogen) and B. subtilis WB800N (MoBiTec) were prepared according to the manufacturer's instructions.
For expression trials in P. pastoris cells, vectors were linearized with PmeI restriction enzyme and transformed using the Pichia EasyComp transformation kit (Invitrogen). Cells were inoculated in low salt yeast peptone dextrose (YPD) plates supplemented with 100 mg/mL zeocin and incubated for 3-4 days at 28uC. Colonies were selected and grown in 100 mL buffered complex glycerol medium (BMGY) at 28uC until an OD 600 nm <2. Cells were then harvested, resuspended in buffered complex methanol medium (BMMY), and protein expression was induced with 0.5% methanol.
Cells were separated from the growth media by centrifugation at 8,0006g for 30 min at 4uC. Secreted proteins were collected from the growth media and dialysed in buffer A (50 mM Tris-HCl, 250 mM NaCl, pH 7.5), and cytoplasmic proteins were extracted from the cells in the same buffer. For lysis, cells were sonicated with 3 pulses of 5 min each at 40% amplitude (Branson digital sonifier). Samples were collected before and after centrifugation (30,0006g for 30 min at 4uC) representing total and soluble protein fractions, respectively.
Selected samples were further purified by affinity chromatography using either nickel-nitrilotriacetic acid-(Ni-NTA), maltose binding protein-(MBP) or glutathione S-transferase-(GST) HiTrap columns, or a Sepharose 4B matrix column (GE Healthcare Life Sciences). 10 mL of crude protein extract was applied to the columns, followed by three washes with buffer A. Proteins were eluted with buffer A supplemented with either 300 mM imidazole (Ni-NTA-affinity), 10 mM maltose (MBPaffinity), 10 mM reduced glutathione (GST-affinity) or 20 mM lactose (Sepharose-affinity). Finally, samples were buffer-exchanged to buffer B (20 mM Tris-HCl, 150 mM NaCl, pH 7.4) using a PD-10 desalting column (GE Healthcare Life Sciences). Samples were kept at 4uC at all times.
For expression and purification of MecR1, the cultures were scaled up to 6L, the collected cells were broken with a cell disrupter (Constant Cell Disruption Systems) at 2.4kBar and nondisrupted cells and cell debris were removed by centrifugation at 20,0006g for 45 min in a Sorvall centrifuge. Membranes were collected by ultracentrifugation at 150,0006g for 2 h at 4uC in a Beckman Optima L-90K using a 50.2 Ti rotor (Beckman) and 26.3-ml polycarbonate bottles with cap assembly (Beckman). Collected membranes were homogenized using a glass Potter and solubilized under gentle stirring by overnight incubation at 4uC in buffer C (50 mM Tris-HCl, 300 mM NaCl, 10 mM imidazole, 1 mM 1,4-dithio-D-threitol, pH 8.0) containing 100 mM lauryldimethylamine N-oxide (LDAO; Sigma) and EDTA-free proteinase inhibitor cocktail tablets (Roche). Non-solubilized proteins were removed by ultracentrifugation as described above. The sample was incubated overnight at 4uC with Ni-NTA resin (Invitrogen). The bound protein was batch purified in an open column (Bio-Rad), washed extensively, and the tagged protein eluted with buffer C plus 300 mM imidazole. The sample was desalted using a PD-10 column in buffer C containing 5 mM LDAO.

Fusion-tag removal by proteinase cleavage
Tobacco etch virus (TEV) proteinase and sentrin specific proteinase 1 (SENP1) were over-expressed in E. coli BL21 (DE3) pLysE cells using pET28-based vectors, which attach an Nterminal His 6 -tag. Cultures (typically 4L) were grown in LB broth at 37uC until an OD 600 nm <0.7-0.8, induced with 0.5 mM IPTG, and incubated either overnight at 20uC or for 5 h at 30uC for TEV proteinase or SENP1 expression, respectively. Subsequently, cells were collected by centrifugation at 5,0006g for 30 min at 4uC and partially purified by Ni-NTA affinity chromatography as previously described [57,58]. Proteinases were stored at 280uC in buffer D (20 mM Tris-HCl, 50 mM NaCl, pH 7.5, 30% glycerol). Proteinase cleavage trials of tagged-proteins were performed overnight at 4uC in buffer B using various protein:proteinase ratios. For trials with thrombin (GE Healthcare Life Sciences), 2 units of proteinase were used to process 25 mg of protein in 100 mL of buffer C at room temperature and aliquots were taken at various time points.

Enzymatic assays
For hydrolytic activity measurements, PNGase F and fragilysin were partially purified by Ni-NTA-affinity chromatography as described above. Glycosidase activity of PNGase F was tested against the glycoprotein ribonuclease B (RNase B; New England Biolabs) at a w/w ratio of 1:5 PNGase F/RNase B and a final protein concentration of 0.5 mg/mL. Reactions were incubated overnight at 4uC and analysed by SDS-PAGE. Peptidase activity of fragilysin was tested against BODIPY FL-casein (Invitrogen) as previously described [59]. Crude protein extracts of CPA2 were used for assays after an initial activation with partial tryptic digestion in a w/w ratio of 1/100 of CPA2/trypsin at room temperature for 1 h. The activated protein was incubated with furyl-acryloyl-L-phenylalanine-L-phenylalanine (0.05 mM; Sigma) in buffer B and the activity was monitored by measurement of the absorbance change at 330 nm.

Western-blot analysis
Protein samples were analyzed by Tricine-SDS-PAGE, transferred to Hybond ECL membranes (GE Healthcare Life Sciences), and finally blocked overnight at room temperature with 20 mL of blocking solution (137 mM NaCl, 2.7 mM KCl, 4.3 mM Na 2 HPO 4 , 1.47 mM NaH 2 PO 4 , 0.05% Tween 20) containing 1.5% bovine serum albumin. MecR1 was detected by immunoblot analysis using custom polyclonal antibodies (Eurogentec) at dilution 1:1,000 and a secondary antibody (goat anti-rabbit IgG (HL) peroxidase-conjugated antibody; Pierce) at dilution 1:5,000 (both in blocking solution). The immune complexes were detected using an enhanced chemiluminescence system (Super Signal West Pico Chemiluminescent; Pierce) according to the manufacturer's instructions. Membranes were exposed to hyperfilm ECL films (GE Healthcare Life Sciences).

Miscellaneous
Denatured protein samples were analyzed by 10%-15% Tricine-SDS-PAGE [60] and stained with Coomassie-brilliant blue. Protein concentrations were routinely determined by absorbance at 280 nm, and, wherever necessary, corrected by the BCA protein assay method (Thermo Scientific) using bovine serum albumin as a standard. Protein identification by peptide mass fingerprinting was performed at the Protein Chemistry Facility of Centro de Investigaciones Biológicas (Madrid, Spain).

Description of the pCri System
We generated a collection of vectors for recombinant protein overexpression in two bacterial (E. coli and B. subtilis) and one eukaryotic (P. pastoris) host strains. Vectors, available from commercial sources or laboratories, were initially modified by inserting new nucleotide sequences or point mutations, and finally evaluated for functionality. Most of the E. coli vectors are pET based [61] with the exception of pCri-12, which is based on pTrc99a [50]. The bacillus and yeast vectors are based on pHT [62] and pPICZ series [63], respectively, and can be stably propagated in E. coli cells when antibiotic resistance is conferred (Tables 1-3). In all vectors, protein expression is achieved by IPTG induction, except for the yeast vectors, for which methanol is required.
The collection consists of 29 vectors grouped into three main categories (Tables 1-3). Based on the available 59end restriction sites for target gene cloning, the vectors are sorted into pCri-a and pCri-b series using either NcoI and MscI or NdeI and NheI sites, respectively ( Fig. 1 and Fig. S1). The pCri-a series is further separated into pCri-a and pCri-a-Strep based on the fusion tag that can be attached at the C-terminus of the target protein.
Within each category, the vectors allow obtaining constructs with different fusion tags or expression in a particular host organism. Usage of the aforementioned 59end restriction sites incorporates a methionine start codon, thus obviating the need to introduce it into the target gene during PCR amplification. An MCS universal for all vectors has been placed at the 39end, which encodes seven rare restriction sites not found in most of the vectors (see vector maps for more details; Fig. S1). For convenience and tracking during vector preparation, a GFP insert is cloned within all vectors. The inserted genes can be sequenced from either terminus with specific primers as detailed in Table 4.
Preparation is greatly simplified, as only two restriction sites are used for directional cloning of a target gene into a large series of vectors. Although newer cloning techniques are now available (e.g. ligation independent cloning system [64]), this method was satisfactory. Cloning of target genes of variable size spanning from 150 to 7,000 base pairs was routinely performed with a success rate of more than seven out of ten positive clones when genes were cloned between an NcoI or NdeI and a XhoI site. To achieve reproducible results, it was essential to repeat double digestions of the vectors with all the restriction enzyme combinations.

Applications and main considerations of the pCri System
The choice and use of a suitable vector should be based on the properties of the target protein and the needs of the experiment in question. Here, in an effort to evaluate the functionality of the collection and to provide a rationale for the use of the vectors, we cloned and expressed several proteins of different origin and function: Fusion tags assisting in protein purification. The pCri System allows the fusion of a His 6-8 -tag at the N-terminus of the target protein, which can be in tandem with larger tags such MBP [43], GST [65], small ubiquitin-like modifier (SUMO) [66], and the b-trefoil lectin module of protein LSL 150 from the mushroom Laetiporus sulphureus (LSL) [53] (Tables 1-3). The C-terminus of the target protein can likewise be furnished with a His 6 -tag or a Strep-tag if the stop codon of the amplified gene is omitted. These tags add a functionality to the target protein, which is commonly used as a first purification step through affinity chromatography [43,53,65]. On this basis, we cloned and expressed GFP in pCri-1a, 4a, 6a, 8a, 11a, and 14a. The proteins were purified by Ni-NTA affinity chromatography except for MBP, GST, and LSL fusion products, which were purified by their specific affinity resins ( Fig. 2A). Nickel or cobalt affinity chromatography of His 6 -tagged proteins are among the most commonly used methods for purification, but others using the affinity properties of MBP or GST, and the recently reported LSL 150 , can provide better purification results under mild elution conditions. This choice among alternative affinity purification systems allows the best purification method to be used for each target protein. Moreover, many of those tags can be used to track poorly expressed proteins by Western-blot analysis, as they are otherwise undetectable by Coomassie-stained SDS-PAGE.
Fusion tags assisting in protein solubility. In addition, several studies showed that tags such as N-utilisation substrate A (NUSA), MBP, or the smaller GST and SUMO have positive effects on the cargo protein due to their solubility-enhancing or chaperoning properties [2,66,67]. Nevertheless, their working mechanism is still controversial, with several studies suggesting a more passive role due to their excellent solubility properties rather than a direct influence on the folding of their partner [47]. For example, fragilysin (Ala212-Asp397) [59], a bacterial enterotoxin metallopeptidase, was expressed in high amounts in fusion with MBP, TRX, GST, and His 6 -tag, both at 37uC or 20uC (Fig. 2B). However, only MBP rendered the protein soluble during low temperature expression trials, whereas other fusions or expression at higher temperatures produced protein prone to aggregation. The protein remained in solution even after MBP removal (Fig. 2C) but catalytically inactive against fluorescent-labelled casein, indicating at least partial misfolding. Similar results were obtained when fragilysin was expressed with the smaller Z-tag (<10 kDa) [59], indicating that fusion proteins may have a positive effect on target solubility without necessarily implying that it will be well folded and active. Nevertheless, these fusion tags can have an application in the expression of proteins with known solubility problems that need to be temporally stabilised until an adequate condition/solution is found [67].
Expression of proteins requiring disulfide bonds and other posttranslational modifications. Correct folding and stabilization requires the formation of disulfide bonds in many proteins. These can be formed in oxidising environments as found in the periplasmic and extracellular environment of bacteria, or in specialised organelles of eukaryotes. B. subtilis has a large secretory capacity, whereas in E. coli secretion is mainly limited to the periplasm [68,69]. In P. pastoris, proteins are first driven to the endoplasmic reticulum and, after folding, they are secreted to the extracellular medium [70]. The pCri System includes vectors that fuse SP specialised for protein translocation to these cellular compartments. pCri-9 and 12 can be used with E. coli cells, whereas pCri-16 and 18 are suitable for expression in P. pastoris and B. subtilis, respectively. In the case of pCri-12, a disulfidebond isomerase C (DsbC) is coexpressed with the target protein and provides additional support in the correct pairing of disulfide bonds in the periplasm [50].
As a test protein, we used human CPA2, which is commonly expressed in P. pastoris cells [71]. Unexpectedly, expression trials indicated that the protein is produced not only in the extracellular environment of P. pastoris but also in the cytoplasm and periplasm of E. coli cells (Fig. 2D). In contrast, B. subtilis did not express the protein either extracellularly or intracellularly. In all cases, the protein was soluble and correctly processed after limited tryptic digestion, showing activity against small substrates. However, this is not always the case. Besides the oxidising conditions other proteins may often participate in correct folding, including  For introducing a His 6 -tag at the C-terminus of the target protein use a reverse primer without a stop codon.  Table 3. pCri-a-Strep vectors. pCri System PLOS ONE | www.plosone.org oxidases, foldases, isomerases and specialised chaperones [69]. Moreover, disulfide bond formation is not the only factor in proper protein folding and stability, and further posttranslational modifications (e.g. glycosylation) may be required, which can be provided by P. pastoris [70].
Another approach for disulfide bond formation exploits the oxidising cytoplasm of thioredoxin reductase B (trxB 2 ) and glutathione reductase (gor 2 ) mutant E. coli cells (Origami 2) [34]. In contrast to the commonly used BL21 cells, Origami 2 efficiently expressed PNGase F, either with pCri-4a or 8a, soluble and catalytically active against RNase B (Fig. 2E and 2F). The protein contains disulfide bonds that require an oxidising environment, which is adequately formed in the cytoplasm of mutant cells. In addition, the combined use of thioredoxin A (TRX) as fusion protein in pCri-4a and expression in Origami 2 can lead to the overexpression of small multi-disulfide proteins, among others [69,72]. This system takes advantage of TRX, which acts as an oxidant when it operates in an oxidized milieu found in mutant cells [34], thus providing an additional mechanism for disulfide bond formation within the cytoplasm. TRX is subsequently removed by TEV proteinase cleavage in the presence of selected amounts of redox agents to assist in correct disulfide bond pairing [72,73].
Expression of membrane proteins. Membrane proteins are among the targets most requested and at the same time difficult to express and purify. To address this issue, a vector was prepared, which fuses a small protein from B. subtilis with target proteins (pCri-13a). This protein, known as the membraneintegrating sequence for translation of integral-membrane protein constructs (MISTIC), folds autonomously into membranes, simultaneously dragging the tagged-protein to the cell membrane [74]. Moreover, this vector contains a longer His 8 -tag in tandem with MISTIC in order to provide higher affinity for Ni-NTA affinity purification.   The GFP gene was cloned into pCri-1a, 4a, 6a, 8a, 11a, and 14a, the proteins expressed in E. coli BL21 cells, and subsequently purified by Ni-NTA-affinity chromatography except for MBP, GST, and LSL fusion products, which were purified by their respective specific affinity resins. (B) The gene coding for fragilysin was cloned into pCri-1a, 4a, 6a and 8a, and expressed in E. coli Origami 2 cells. Total (T) and soluble (S) fractions of crude protein extracts were further analysed by SDS-PAGE. All expression trials were performed at 20uC except for pCri-1a, which was also performed at 37uC. For evaluation purposes, we cloned and expressed MecR1, a membrane metallopeptidase from Staphylococcus aureus implicated in methicillin resistance [75]. Detectable levels of expression were only achieved when the protein was fused with MISTIC, whereas mere fusion with an N-terminal His 6 -tag was unsuccessful (Fig. 2G). Moreover, expression yields of the protein were sufficient (0.4 mg of affinity purified protein per litre of culture) to enable partial purification by Ni-NTA affinity chromatography after solubilisation of the membranes with the zwitterionic detergent LDAO (Fig. 2H). Although further studies are required to assess the folding state of this protein, fusion with MISTIC allowed us to express it in milligram amounts. Many other membrane proteins were also expressed in fusion with various MISTIC constructs, indicating that this system could be an alternative approach for membrane proteins that are difficult to express [52].
Removal of fusion tags. In most cases, release of the target protein from any fused tag is desirable. In the pCri System, a TEV proteinase cleavage site is introduced immediately after the tag in all vectors except for pCri-11 and pCri-13, in which a SENP1 or thrombin site is found, respectively (Fig. S1). TEV proteinase is a highly specific enzyme that recognises an hexapeptide sequence [76], whereas SENP1 further offers robustness and high proteolytic activity in addition to high specificity, usually requiring only minute amounts for tag removal [77]. Moreover, use of a thrombin cleavage site in pCri-13a was necessary due to the low efficiency of TEV proteinase in the presence of detergents which are required during membrane-protein solubilisation [78]. In addition, linker sequences (Gly-Ser) 5 and Gly 3 -Ala were introduced before and after the thrombin recognition site, respectively, to improve access for proteinase cleavage (Fig. S1) [79].
Tag removal was achieved with variable amounts of endopeptidases, different incubation times and temperatures (Fig. 2I). These studies indicated that optimisation trials are needed in each case to identify the best conditions for complete digestion (e.g. buffer, temperature, proteinase:substrate ratio). Proteinase cleavage and tag removal result in the incorporation of one or two extra residues at the N-terminus of the expressed protein except for pCri-4b and pCri-13a, which attach three and six residues, respectively (Tables 1-3).

Conclusions
Here we introduce a vector collection designed for large-scale recombinant protein overexpression, and demonstrate its suitability in a series of test proteins. The choice of a suitable expression vector should be based on target and tag properties. The availability of a range of fusion tags allows the choice between different affinity purification methods. Moreover, some tags were included for specific use, such as MISTIC and TRX, which are intended for expression of membrane and disulfide rich proteins, respectively. In general, our common strategy first explores the effects of the presence or absence of N-or C-terminal tags (e.g. His 6 -tag or Strep-tag) on each construct under different host cell growth conditions. Omission of the tag or alternation of the position can drastically influence the expression and solubility of the protein. If this approach is ineffective the chances of optimising the expression by testing other fusion combinations are reduced. Several reports showed the beneficial effects of the fusions on target solubility [2,67]. However, this is not always the case: the protein is often dragged into solution, rather than acting as a chaperone for the proper folding of its fusion partner. Removal of the fusion tag can revert the positive effect and cause precipitation [47,48]. If this occurs, then modified constructs, other homologous targets or even other expression systems need to be explored, including bacterial and eukaryotic cells that can be easily tested using the vector collection of the pCri System.