Genomic Analysis of Sleeping Beauty Transposon Integration in Human Somatic Cells

The Sleeping Beauty (SB) transposon is a non-viral integrating vector system with proven efficacy for gene transfer and functional genomics. However, integration efficiency is negatively affected by the length of the transposon. To optimize the SB transposon machinery, the inverted repeats and the transposase gene underwent several modifications, resulting in the generation of the hyperactive SB100X transposase and of the high-capacity “sandwich” (SA) transposon. In this study, we report a side-by-side comparison of the SA and the widely used T2 arrangement of transposon vectors carrying increasing DNA cargoes, up to 18 kb. Clonal analysis of SA integrants in human epithelial cells and in immortalized keratinocytes demonstrates stability and integrity of the transposon independently from the cargo size and copy number-dependent expression of the cargo cassette. A genome-wide analysis of unambiguously mapped SA integrations in keratinocytes showed an almost random distribution, with an overrepresentation in repetitive elements (satellite, LINE and small RNAs) compared to a library representing insertions of the first-generation transposon vector and to gammaretroviral and lentiviral libraries. The SA transposon/SB100X integrating system therefore shows important features as a system for delivering large gene constructs for gene therapy applications.


Introduction
The Sleeping Beauty (SB) transposon is a member of the Tc1/ mariner transposon superfamily. Tc1/mariner elements are generally 1,300-2,400 bp in length and contain a single gene coding for the transposase that is flanked by terminal inverted repeats (IR). The IRs of SB host a pair of binding sites containing short, 15-20 bp direct repeats (DRs). Both the outer and the inner pairs of transposase-binding sites are required for transposition. The SB transposase binds the IRs in a sequence-specific manner, and mediates precise cut-and-paste transposition in a wide variety of vertebrate cells including human cells [1][2][3]. For this reason, the SB-based integration system is a valuable tool for functional genomics in several model organisms and represents a promising vector for human gene therapy [4,5]. However, a major bottleneck of any transposon-based application is the low transposition efficiency. Therefore, considerable effort was dedicated to improve the SB integration machinery by modifying its IRs and systematically mutating the transposase gene. In 2002, Cui et al. carefully explored the structure and functions of the IRs. They modified the outer and inner DR sites of both IRs and the spacer sequence between the DRs generating a new version of transposon IR, called T2, with fourfold increased transposition efficiency [6]. However, the transpositional activity of this system (and that of the first-generation transposon [7]) is negatively affected by the size of transposon, resulting in an exponential drop for every kb introduced between the two IR.
In 2004, Zayed et al. constructed the ''sandwich'' (SA) version of the transposon vector [8]. The SA IR consists of two complete transposon elements in a head to head orientation, flanking a DNA expression cassette, thereby forming a sandwich-like arrangement. Mutation of the 59 terminal CA nucleotides of the right IR abolishes cleavage at the innermost transposon ends; therefore, only the four terminal DRs represent the catalytic substrate for the ''cut and paste'' transposition. The SA transposon showed a 3.7-fold enhanced activity over first generation transposon to integrate ,7.5 kb-DNA sequence upon SB10 transposase delivery. Five years later, a transposase 100-fold more active than SB10, named SB100X, was developed by a highthroughput, PCR-based DNA shuffling strategy [1]. The improved integration efficiency associated with SB transposition opened new avenues for its application. The hyperactive SB100X transposase was employed to obtain highly efficient germline transgenesis in pigs [9,10] rabbits [11] and rodents [12,13], stable transfer of therapeutic genes in clinical relevant cells [1,[14][15][16][17][18], and reprogramming of mouse embryonic and human foreskin fibroblasts into iPS cells [19].
In this study, we investigated the integration efficiency of large expression cassettes mediated by the optimized SB elements: the SA transposon and the SB100X transposase. We report a side-byside comparison between the SA and the T2 transposons carrying DNA cargo of increasing length. We performed a deep molecular characterization of SA-mediated integrants in epithelial cell lines and in primary immortalized keratinocytes stressing the SB system with cargos up to 18 kb. These data provide evidence for stability of SB-mediated integration and the reproducibility of the cut-andpaste mechanism even with large transposons embedded between two double IRs. Moreover, clonal analysis reveals a linear correlation between transposon copies harboured into the genomic DNA and their expression, an important characteristic for gene therapy application. Finally, high-resolution, genome-wide mapping of SA integrations in human keratinocytes revealed a close-torandom integration pattern with respect to genes and chromosomes, highlighting a relative low risk of genotoxicity as previously reported for SB transposition in cell lines [20][21][22][23]. Interestingly, the high-throughput analysis of SA integration sites showed an overrepresentation of integration events into repetitive elements (RE) of the human genome, in particular satellite, small RNA and LINE elements.

Cell culture
HeLa cells were cultured using DMEM medium (Lonza) added with 10% Fetal Bovine Serum (FBS), 1% L-Glutamine (L-Gln) and 1% Penicillin-Streptomycin (Pen/Strep). For each experiment, an aliquot of cryo-preserved HeLa cells was thawed and plated on 8 cm dishes. Upon reaching 80-90% of confluency, cells were re-plated on 6-wells culture plates at a concentration of 2610 5 cells/well. After 24 h, cultures in each well were at 70-80% confluency, ready to be transfected.
We have used SV40 immortalized keratinocytes derived from a patient affected by generalized atrophic benign epidermolysis bullosa (GABEB) produced by Borradori et al. [24] and kindly provided by J.W. Bauer. GABEB cells were cultivated in EpiLife medium supplemented with human keratinocyte growth supplement (HKGS) (Invitrogen, US). EpiLife is a serum-free keratinocyte culture medium with a low calcium (0.06 mM) concentration supplemented with HKGS which results in a final concentration of 0.2% (v/v) BPE, 5 lg/mL bovine insulin, 0.18 lg/mL hydrocortisone, 5 lg/mL bovine transferrin and 0.2 ng/mL human EGF. Upon reaching 80-90% of confluency, cells were re-plated on 6wells culture plates at a concentration of 2.3610 5 cells/well. After 24 h, cultures in each well were at 70-80% confluency, ready to be transfected.

Plasmid constructs
The plasmid carrying the T2 IRs including a Venus reporter gene driven by the chicken b actin promoter fused to CMV early enhancer element (CAGGS) and the construct coding for the SB100X were described in Mates et al. [1]; the SA transposon IRs were described in Zayed et al. [8]. The CAGGS Venus expression cassette was Dra III excised from pT2 3.2 and introduced into EcoRV digested pSA to obtain pSA 5.7. pT2 3.2 and pD28 [25] were digested with XbaI to clone a non coding DNA of 2.7 kb from pD28 into the transposon.
Two fragments of the first intron of the HPRT gene were PCR amplified and cloned into the pCR 2.1 (TOPO cloning kit, Invitrogen) plasmid. The pT2 10 plasmid was cloned ligating the pT2 CAGGS Venus SpeI with NheI fragment of the amplified HPRT intron 1. The pT2 14 plasmid derives from pT2 10 digested with ClaI ligated to the NotI fragment of the amplified HPRT intron 1. Finally, pT2 18 was obtained by ligating a third sequence amplified from the HPRT intron 1 with pT2 14 through EcoRI restricted ends. The pSA 5.7 plasmid was digested with NheI and ligated to the NheI non coding fragment of the HPRT gene to obtain the pSA 9.7. Then the pSA 9.7 was digested with PmeI enzyme and ligated with a PvuII fragment of the HPRT intron 1 to obtain the pSA 14. To enlarge the pSA14, a sequence amplified from the intron 3 of the Lamb3 gene was introduced by EcorV compatible ends to obtain pSA 18.

Transfection-based transposition and calculation of transposition efficiency
HeLa and GABEB cells were both transfected with FugeneHD transfection reagent (Roche). For each sample 2 mg of DNA were added to 100 ml of either DMEM (for HeLa) or EpiLife (for GABEB). The media used for this transfection reaction mix were not added with FBS, L-Gln or Pen/Strep.
The transposon/transposase amounts of plasmid DNA were calculated to respect the stoichiometric ratio of 1:1 or, for transposon .10 kb, 2:1, in a total quantity of 2 mg. 2 mg of transposon-only plasmid were used for non-transposed control.
Each transfection reaction mix was complexed with 6 ml of FugeneHD (10 ml with SA and T2 18 in GABEB cells) and subsequently mixed by pulse-vortexing for a few seconds. The mixes were thereafter left at room temperature for 109 in order to allow the formation of lipoplexes. After the 109 had expired, each mix was added drop-by-drop to a cell culture sample, which was subsequently incubated at 37uC.
HeLa cells were transfected with Calcium Phosphate method using 15 mg of 14-or 18 kb transposons mixed with the plasmid carrying the transposase expression cassette.
The percentage of Venus + cells was determined 2 and 20-30 days post-transfection via flow cytometry and the transposition efficiency was calculated as: Venus + cells at 20-30 days post transfection/Venus + cells at Day 26100. Cells that were only transfected with the transposon plasmid represented the control for background integration events.
Transposed clones were analysed via flow cytometry to determine the presence of doublets and the Venus mean fluorescence intensity (MFI).
HeLa cells were seeded to obtain a concentration of 0.3 cells/ well in a 96 well plate in DMEM medium complemented with 10% FBS.

Southern blot analysis
Ten mg of genomic DNA, extracted from 125610 6 cells by a QIAmp DNA Mini kit (Qiagen), were digested overnight with NheI (SA 9.7-derived clones) and AflII (T2 10-derived clones) to verify the copy number of the transposed cassette, or with NcoI (SA 9.7-derived clones) and MfeI plus NdeI (T2 10-derived clones) to verify the integrity of the transposed cassette. Digested gDNA was run on a 0,8% agarose gel, transferred to a nylon membrane (Duralon, Stratagene) by Southern capillary transfer and probed with 2610 7 cpm 32 P-labeled Venus probe according to standard techniques [26].

PCR screening for episomal SB vectors
About 100 ng of template gDNA were used in a PCR reaction. Primers capable to amplify the Amp resistance gene or the SB100X transposase (Table S1) were used to detect genomic integrations of SA 9.7 backbone and SB100X, respectively. PCR conditions were as follows: 3099 at 94uC, 3099 at 58uC and 3099 at 72uC for 30 cycles.

LM-PCR and bioinformatic analysis
Integration sites were amplified by Linker Mediated PCR (LM-PCR), as described [27]. Briefly, genomic DNA was extracted from 0.525610 6 transposed cells and digested with MseI and XhoI enzyme to prevent amplification from internal mutated IR fragments. An MseI double-stranded linker was then ligated and LM-PCR performed with nested primers specific for the linker and SA IR/DR (Table S1).
LM-PCR derived amplicons were run on a Roche/454 GS FLX using titanium chemistries by GATC Biotech AG Next Gen Lab. A valid integration contained: the TAGpSAIR nested primer and the entire SA IR/DR sequence up to a TA dinucleotide.
Alignment pipeline. 31,603 sequencing reads were tested for the presence of the SA IR sequence and TA dinucleotide. The SA IR and any primer sequences were trimmed, and the remaining reads starting with TA dinucleotides were mapped to the human genome (hg19) using NCBI BLAST (blastn with default parameters). We kept only reads which were mapped to a single genomic site with at least 90% sequence identity and an E-value of at most 0.05. Only reads which could be mapped from their 59 end onwards were considered for further analysis. Redundant reads mapping to identical genomic positions were collapsed. This way we got 2019 unique SA integration sites.
For the statistical analysis we generated 10,000 control sites insilico taking into account the bias introduced by LM-PCR techniques. We first generated artificial reads starting with TA dinucleotide of the human genome in a way that the control sequences had both the length and the frequency of MseI restriction sites (TTAA) as observed in real sequencing reads. The artificial reads were then processed by the same mapping criteria used for the SA sites.
RM blast analysis. Analyses of repetitive element were performed with RepeatMasker Blast (http://repeatmasker.org) [28]. To achieve reliable and comparable results we processed the raw sequences trimming out the primer sequences used in LM-PCR, the IR/LTR/linker specific sequences following the primers. Resulting reads were further trimmed till the 40 th nucleotide discarding every sequence with less than 40 nucleotides. Finally, we collapsed the reads that were either identical or with one mismatch. A two-sample test for proportions was used for pairwise comparison of the RE within the different datasets.
For statistical analysis we created control sets as follows. We first randomly sampled 1 Million sequences 49 bp in length from the human reference genome (hg19). Then we discarded all sequences not starting with TA. The resulting set of 65,826 TA-weighted sequences was used as a background for Tneo SB integrations. For a second random control set we first randomly sampled 10 Million sequences of length 120 bp from the genome. Then we discarded all sequences not starting with TA, or either not containing the MseI restriction motif TTAA or having a TTAA within the first 39 bp of the sequence. After removing the part of the sequences following the first occurrence of TTAA, we received 292,917 sequences of lengths between 40 bp and 120 bp, which were weighted for TA and MseI and could be used as a background for SA integrations. We passed the generated sequences through the same filtering/trimming pipeline as the actual integration reads.
A third random control set of 45,235 genomic sequences weighted for MseI was adapted from Cattoglio et al. [29] and used as a background for MLV and HIV integrations.

Bidirectional PCR mapping on GABEB clones
Transposon integrations in GABEB clones were amplified by LM-PCR as described. PCR products were shotgun-cloned (TOPO TA cloning kit, Invitrogen) and then sequenced. Sequences between the TA and the linker primers were mapped onto the human genome by the BLAT genome browser (UCSC Human Genome hg19). Sequences featuring a unique best hit with $90% identity to the human genome were considered genuine integration sites. To confirm the genuine integration in both directions we design primers on the genomic region hit and performed a direct PCR in conjunction with the pSAIR specific primer for the SA IR sequence ( Table S1). The derived amplicons were loaded on agorose gel and checked for the expected length.

Efficiency of T2 and SA transposons
The sandwich (SA) transposon vector has superior ability to transpose .10 kb transgenes with respect to the first-generation transposon when SB10 transposase was provided [8]. Nevertheless, the T2 transposon, resulting from site-specific mutations in the IR sequences and insertion of double TA flanking each IR, has been demonstrated to have a four-fold enhanced activity over the first-generation transposon construct [6]. A side-by-side comparison of SA and T2 transposon was needed to address the transposition efficiency of increasing DNA cargoes and to verify their molecular behaviours once integrated into the human genome.
We generated SA-and T2-based plasmids (SA 5.7 and T2 3.2 Figure 1) keeping the Venus reporter gene as standard expression cassette. Increasing sizes of a non-coding human stuffer DNA (4-, 8.3-and 12.3 kb in the SA plasmid; 6.8-, 10.8-and 14.8 kb in the T2 plasmid) were introduced between the two IR/DR to produce transposons of comparable length. For the sake of simplicity, we named these plasmids with the transposon construct type and the size of the transposable cassette expressed in kilobases ( Figure 1).
Transposition experiments were performed in HeLa cells and in immortalized primary keratinocytes derived from patients affected by Generalized Atrophic Benign Epidermolysis Bullosa (GABEB), an inherited skin adhesion defect. All the experiments aimed at the identification of the integration efficiency of the IR-flanked transgene were measured by long-term Venus fluorescence in the absence of selective pressure. We co-transfected the SB100X transposase-expressing plasmid together with transposon plasmids in two different molar ratios (1:1 or 1:2) depending on the transposon length. Larger cargos required more transposon DNA to reach good transfection efficiency.
At least three independent experiments for each cell type and transposon were performed in order to reduce variability due to the transfection procedure. Mock-transfected HeLa and GABEB cells, and cells transfected with the T2 or SA Venus constructs alone were used as controls (in the absence of transposase, no transposition event should occur and residual reporter gene expression after long periods would only be attributable to noise or to rare random plasmid integration events). Transgene expression all along the culture period (up to 31 days) was measured via flow cytometry to follow the trend of the signal that persists in presence of SB100X and drops without the transposase ( Figure S1).
The transposition efficiency was normalized by transfection efficiency (numbers of cells that received the plasmids after transfection) and calculated as the ratios between the percentage of Venus + cells at the endpoint (20-31 days) and the percentage of transfected cells 2-3 days after DNA delivery to the cells. The endpoint of each experiment is achieved when the percentage of Venus + cells in the sample transfected with the transposon alone stabilized to less than ,0.5%. Figure 2A and 2B show the transposition rate obtained in HeLa and GABEB cells. As previously reported [1,8], the transposition efficiency was inversely proportional to the transposon size. In HeLa cells, the transposition efficiency dropped 7.8 fold (from 58.5% to 7.5%) when increasing the cargo payload from 3.2 kb to 18 kb, independently of the transposon structure (T2 or SA). Interestingly, this size-dependent effect was less pronounced in GABEB cells. In this cell type the decrease was of 1.8 fold (from 44% to 24%) for T2 and SA and the transposition rate for 18 kb transposons remained approximately 24% compared to the 7.5% in HeLa cells.

Clonal molecular analysis
Although we performed a molecular characterization of almost all T2 and SA vectors in HeLa or GABEB cells (Table S2.), we focused our genomic analysis on a relatively large T2 and SA transposons cassette (10 kb) and on GABEB keratinocytes. Bulk populations of transposed cells were sorted for Venus expression 20-35 days post transfection and cloned by limiting dilution. Genomic DNA extracted from each clone was first investigated by PCR for the presence of the transposon backbone and SB100X expressing plasmid. Notably, we scored 14.8% of clones (8 out of 54) positive for the Ampicillin sequence present within the transposon backbone about 60 days post transfection, while few (2 out 54) of the analysed clones were positive for the SB100X sequence (Table S2). We next performed Southern blotting on the genomic DNA of 16 clones for each transposon type to determine the transgene copies harboured in the genome and their integrity. To this end, we digested the genomic DNA with AflII (T2 clones) or NheI (SA clones) that release fragments longer than 3.4 and 4.2 kb. Hybridization with a Venus-specific probe showed that most of the SA treated samples (13 out of 16) carry a single integrated transposon, only 1 clone (#26) had 3 copies, and 2 out of 16 clones contained 2 copies (#8, #13) resulting in an average copy number of 1.3. Surprisingly, 16 GABEB clones obtained with T2, harbour 1 to 7 copies with an average of 3 integrated transposons per clone ( Figure 3A). In general we observed that the mean copy number is more affected by the transfection efficiency (Table S2) respect to the size and type of transposons.
Further restriction analysis performed with MfeI and NdeI on 9 T2 clones and with NcoI on 8 SA clones showed that all clones harbour the full-length transposon cassette ( Figure 3B). Among the 21 integrated transposons in the 9 T2 clones, only one, belonging to clone #3, is shorter than expected. None of the 13 integrated transposons in the 8 SA clones was rearranged.
To unequivocally prove that all the integration events mediated by SA transposition resulted from a genuine ''cut and paste'' mechanism, we mapped the insertion site at both transposon ends using an adapted version of Linker-Mediated PCR (LM-PCR) [27]. Ten Venus-expressing GABEB clones, derived from transposition of the SA 5.7 plasmid, were examined. Six integrants (#1, 4, 7, 13, 14, 16) belonging to 5 clones were bi-directionally mapped by LM-PCR. Additional 21 integrants were revealed by LM-PCR and confirmed by specific PCR on the genomic region flanking the opposite IR ( Figure 3C). Importantly, almost all the integration events occurred without genomic rearrangements, deletions or insertions, in the target sites. Only 2 out of 27 integrations (#26 and #27 belonging to clone 34) could not bidirectionally confirmed.
Finally, we correlated the expression level of the reporter gene with the copy number of the transposon. The positional effect variegation primarily observed with retroviral and lentiviral vectors [30] could lead to the silencing of the therapeutic gene delivered by the vector. We asked weather the SB integrations would be affected by this phenomenon. We correlated the expression of Venus protein, measured by Mean Fluorescence Intensity (M.F.I.), with copy number of either the SA and T2 transposon, as determined by Southern blot or q-PCR analyses of 62 GABEB clones. For comparison, we analysed the M.F.I of a GFP reporter gene, driven by the human Keratin 14 promoter, in 70 HaCaT clones isolated upon LV transduction. A linear correlation curve was traced to retrieve the R 2 coefficient of determination. Transposon samples show an R 2 = 0.759 with a statistically defined correlation between two variables (P N = 0.6). LV samples display an R 2 = 0.001 with a null defined correlation ( Figure 4). Independent analysis of transposed clones obtained in different cells (HaCaT and GABEB) and carrying a reporter gene driven by PGK or CAGGS promoter showed comparable results indicating common directly correlation between MFI and copy number (data not shown). We conclude that SB integrants tend to express their cargo faithfully, and multi-copy integrants express in a copy-number dependent manner, consistent with earlier observations [31].

Integration pattern analysis
In the last few years, several papers described the integration profile of the SB, piggyBac (PB), and Tol2 transposons [20][21][22][23][32][33][34][35][36]. Here we report the integration profile and preference of the sandwich compared with the first-generation SB transposon [20] in human epithelial cells. To generate a library of SA integration events, we transfected 20 million GABEB cells with SA transposon-and SB100X-carrying plasmids. The 20% of Venuspositive cells were sorted three days after transfection to enrich the population expressing the reporter gene. A 90%-pure sorted population was kept in culture for 3 weeks to dilute the unintegrated SA vector reaching a stable 78% Venus + bulk population. We used LM-PCR and pyrosequencing to generate 6,084 non-redundant SA-linked genomic sequences in human immortalized GABEB keratinocytes. The Blast alignment retrieved 2,019 unambiguously mapped integration sites. As a control, 10,000 random unique sequences were generated in silico balancing the biases introduced by the LM-PCR (amplicon lenght and MseI proximity) and the availability of the TA dinucleotides in the genome. In the analysis we also annotated a large dataset (59,169 hits) generated in HeLa cells transposed with the firstgeneration Tneo transposon and selected for 2 weeks with neomycin [20]. The integration sites and control sites were annotated as transcriptional start site (TSS)-proximal when mapping in the 62.5 kb window around a TSS, intragenic when mapping within a transcription unit, and intergenic in all other cases. Among SA integrations, 58.6% were in an intergenic position, 38.9% were within the transcribed portion of at least 1 gene, and 2.5% was within a 5 kb window encompassing the TSS   Figure 5A; the complete list of sequences is available in GenBank database with the accession number SRP047118). In general, the distribution of the SB integrants in both datasets is fairly random and resembles the composition of the human genome showing no statistical differences compared to their relative controls, i.e. all p-values (both two-sample tests for proportions and Fisher's Exact Tests) were .10 22 .
We then analysed the frequency of human repetitive elements in the transposon libraries, SA and Tneo [20], and their relative weighted controls availing of the RM Blast browser [28]. For comparison we also analysed two viral-derived integration datasets (MLV and HIV) generated in human CD34 + multipotent hematopoietic progenitor cells (HPCs) [29] and their control library weighted for MseI restriction site distribution. The raw data generated by deep sequencing of the LM-PCR (applied to SA, MLV and HIV treated cells) and LAM-PCR (applied in [20]) products were filtered and trimmed in order to rescue the genuine integration events (see materials and methods). After filtering and trimming we retrieved 6,084 and 165,887 unique sequences in SA and first-generation vector libraries, respectively, and 37,873 and 31,204 unique sequences from MLV and HIV datasets, respectively. We generated large control datasets taking into account the bias introduced by the respective technique. In particular from the hg19 genome database we retrieved 45,235 control reads weighted for MseI, 65,570 sequences weighted for the presence of TA dinucleotide hit by the SB transposons, and 209,913 sequences MseIand TA-double weighted.
The RM Blast analysis revealed an overrepresentation of REs in the SA integrations (34%) with respect to the TA and MseIweighted control (14%) and to all the other datasets analysed ( Figure 5B). In particular, Satellite, small RNA and LINE elements were enriched in the SA library (24-, 7.6-and 3.5-fold increase over the background, respectively) whereas in the first-generation vector library only a slight increase in the satellite and simple repeats elements was measurable (3.5-and 2.6-fold over the background, respectively); comparable LINE frequency was detected.
Besides the higher frequency in the satellite elements, the two SB transposon datasets share a slight under-representation of SINE, LTR and DNA transposable elements in comparison with their random control libraries. We introduced MLV and HIV libraries to compare the frequency of integration into RE generated by a retroviral integrase-mediated integration mechanism. The RM Blast analysis pointed out that viral vectors disfavour integration in RE (14-16% vs 24%), and, in particular, satellite, LTR and LINE elements are underrepresented. These data clearly confirm a difference in the integration site selection between viral vectors and SB transposons and identify new signatures in the SA integrome that should be taken into consideration when using them as tools for genetic manipulation.

Discussion
The SB transposon IRs were mutated to improve their capacity to be mobilized and, to date, there is not a direct comparison that define genetic characteristics of the T2 and SA IRs [8]. In this study, HeLa cells and GABEB keratinocytes [24] were transfected with a panel of T2 or SA transposons carrying size-increasing Venus expression cassette in combination with SB100X plasmid (Figure 1). Transfection rate was higher in HeLa than in GABEB cells ( Figure S1) and the transposition efficiency was inversely proportional to the transposon size ( Figure 2). Interestingly, HeLa cells were severely affected by the transposon size compared to primary immortalized cells. These results suggest that the transposase activity could be favoured by some cellular factor differentially expressed in GABEB and HeLa cells. Nonetheless, T2 and SA constructs carrying cargos of comparable size showed similar transposition efficiency in both cell lines. From these data we can conclude that the T2 IR construct is interchangeable with the SA construct with some advantages: T2 has shorter IRs thereby it could accept a larger cargo cassette.
Transposed GABEB and HeLa populations were subjected to limiting dilution to obtain a single cell derived expansion. The derived clones were employed to characterize several molecular parameters: transposon-independent insertion, copy number, genomic stability, faithful transposition activity, correlation between copy number and expression of the integrated cassette. The SB100X sequence was retrieved in 6 out of 211 analysed clones while almost 14% of the clones (30 clones) were found positive for the transposon backbone sequences (Table S2). We hypothesize that the plasmid backbone carrying the transposon could have some advantages to remain episomal or to integrate in the genome. The transposon excision step from the plasmid leaves the backbone with a double strand break that induce recruitment of the endogenous repair machinery and integration into the cell genome. We also analysed the copy number of the clones. Figure 3A shows an average of 1.3 SA copies/clone while T2 copy number spans from 1 to 7 transposons with an average of 3 copies. This difference mostly depends on the transfection efficiency as confirmed by the analysis of the other transposed cell populations generated in this study. Therefore, it is possible to fine tune this parameter by adjusting the ratios of the two SB  components used for transfection or, as previously reported, bypass the transfection procedure through the viral delivery of transposase and transposon by adenoviral vector [37], integration defective lentiviral vector [38,39], retroviral particle [40] and adeno-associated vectors [41].
We were able to associate copy number of the transposon with the expression level of the Venus fluorescence gene. Mean Fluorescence Intensity does follow a direct proportion with the copies harboured ( Figure 4). In contrast, expression of the reporter gene in lentiviral-mediated integrants does not correlate with copy number and is more subjected to the activity of surrounding genomic sequences [42,43].
Next, the integrated transposons in these clones were also analysed for their integrity via Southern Blot. Retroviral and lentiviral vectors can rearrange during the reverse transcription step resulting in partially-deleted integrated proviruses, a frequent occurrence in transgene hosting repetitive sequences [44,45]. The SB mediated integration, by contrast, does not require reverse transcription and thus is expected to preserve the integrity of the transgene. Ninety-eight percent of the integrants, resulting from T2 and SA transposition, have a correct size (Figure 3 B).
The sandwich transposon has a doubled IR/DR structure at both ends with 8 transposase binding sites in total. In principle, every transposase unit, bound to one DR site, could interact with the others to create different chiasm geometries (also described in [6]); some of these conformations could modify the integration activity resulting in chromosomal aberrations. To investigate the fidelity of the transposition process 10 GABEB clones were mapped bi-directionally by LM-PCR and transposon-genome junction was amplified by site-specific PCR. Twenty-five integrations, out of 27 (92.6%), were validated for a canonical transposition event with the TA target site duplication signature at both ends ( Figure 3C). Two integrations mapped by LM-PCR were not confirmed in the opposite transposon end suggesting rearrangements probably caused by the repair mechanism occurred in the transposition break.
LM-PCR was also employed to derive a high-definition map of SA/SB100X integration sites in the genome of a transposed GABEB bulk population. This analysis is commonly applied to integrating vectors (i.e. retroviral and lentiviral vectors) because it allows to evaluate genotoxicity [46,47] and to understand molecular mechanism driving the integration towards specific regions of the genome [29,[48][49][50]. The technique returned 2,019 SA unambiguously mappable integration sites randomly distributed throughout the human genome, in accordance with previously published data on first-generation transposon [20,23] ( Figure 5A). For gene therapy purposes, the SB system results in a safer integration profile compared to other integrating vector such as Tol2, PB transposon and retroviral vectors [20][21][22][23]32,33], which favor TSS-proximal regions or gene body sequences.
Although the integration site distribution in relation to genes was found close to random, the RM Blast analysis shows a significant bias distribution of SA integrations in repetitive elements (RE), particularly in satellite, LINE and small RNA genes. It could be that these genomic regions are favourable for integration due to their base composition (TA-richness) or there might be molecular mechanisms that actively recruit the transposon/transposase complex at specific RE sites [51][52][53][54][55].
Curiously, the frequency of RE elements in the first-generation transposon library and its weighted control were comparable. Differently from the SA (obtained in 80% Venus expressing immortalized keratinocytes without selective pressure), the firstgeneration transposon library derives from transposed HeLa cells [56] selected for two weeks by antibiotic resistance. This culture condition could negatively select those integrations landing into poorly expressed genomic loci or into heterochromatin regions [35]. Nevertheless, the first-generation transposon integrations were slightly increased into satellite regions and SINE, whereas LTR and DNA elements were underrepresented compared to the background.
These data identify some common features in SB datasets. Conversely, MLV and HIV-derived viral vectors disfavour integration in RE (satellite, LTR and LINE accordingly also to [57]) suggesting an active role of viral integrase in the selection of integration sites that could better support the expression, replication and survival of the viral progeny. The genomic features newly identified in the SA integrome raise an interesting matter that needs to be deeply investigated for future application.