Towards a Safer, More Randomized Lentiviral Vector Integration Profile Exploring Artificial LEDGF Chimeras

The capacity to integrate transgenes into the host cell genome makes retroviral vectors an interesting tool for gene therapy. Although stable insertion resulted in successful correction of several monogenic disorders, it also accounts for insertional mutagenesis, a major setback in otherwise successful clinical gene therapy trials due to leukemia development in a subset of treated patients. Despite improvements in vector design, their use is still not risk-free. Lentiviral vector (LV) integration is directed into active transcription units by LEDGF/p75, a host-cell protein co-opted by the viral integrase. We engineered LEDGF/p75-based hybrid tethers in an effort to elicit a more random integration pattern to increase biosafety, and potentially reduce proto-oncogene activation. We therefore truncated LEDGF/p75 by deleting the N-terminal chromatin-reading PWWP-domain, and replaced this domain with alternative pan-chromatin binding peptides. Expression of these LEDGF-hybrids in LEDGF-depleted cells efficiently rescued LV transduction and resulted in LV integrations that distributed more randomly throughout the host-cell genome. In addition, when considering safe harbor criteria, LV integration sites for these LEDGF-hybrids distributed more safely compared to LEDGF/p75-mediated integration in wild-type cells. This approach should be broadly applicable to introduce therapeutic or suicide genes for cell therapy, such as patient-specific iPS cells.


Introduction
The capacity to integrate transgenes into the host cell genome makes retroviral vectors (RV) an interesting tool for gene therapeutic applications as stable insertion of transgenes into the genome ensures long-term expression. Use of RV-mediated gene transfer resulted in successful cure of several monogenic, primary immunodeficiency disorders [1][2][3]. Yet, stable insertion occasionally altered endogenous gene regulation resulting in insertional mutagenesis. Due to this major setback 5 out of 19 treated patients developed leukemia in otherwise successful clinical gene therapy trials for X-SCID and 2 out of 2 patients treated for X-CGD acquired myelodysplastic syndrome [3][4][5][6]. Both trials employed murine leukemia virus (MLV)-based gammaretroviral vectors (γRV) that integrate in close proximity to gene regulatory regions [7][8][9] and resulted in transcriptional deregulation due to up-regulated LMO2 expression [10][11][12][13]. Similar reports on insertional mutagenesis were published after integration of γRV near CCDN2, BMI1 and EVI1 [14,15]. Despite improvements in vector design (e.g. self-inactivating (SIN) vectors) their use is still not risk-free [3,4,6,[14][15][16], which shifted attention from yRV towards HIV-derived lentiviral vectors (LV). Even though LV display a more favorable integration pattern, induction of aberrant splicing [17,18] and insertional mutagenesis remain a major concern, as clonal expansion was observed in a gene therapy trial for β-thalassemia [19]. In addition, two recent independent studies revealed clonal expansion in HIV-1 infected patients on antiretroviral therapy due to HIV-1 virus triggered insertional mutagenesis [20,21]. Retroviral integration is a non-random process which is, depending on the viral genus, associated with specific chromatin marks and genomic features [22][23][24]. yRV predominantly integrate in the vicinity of gene regulatory regions, whereas LV preferably target the body of active transcription units [10,25]. Integration is catalyzed by the viral integrase (IN), whereas integration site choice bias is attributed to the cellular chromatin readers that are co-opted by the viral IN. Whereas the bromodomain and extra-terminal domain (BET) family of proteins (BRD2, 3 and 4) guide MLV integration [26][27][28], LV integration is directed by Lens epithelium-derived growth factor p75 (LEDGF/p75) [29,30]. Both function as molecular tethers in the cell, combining a chromatin-binding and a protein-interacting region (reviewed in [31]). For LEDGF/ p75 (Fig 1A), the chromatin-binding part contains an N-terminal Pro-Trp-Trp-Pro (PWWP) epigenetic reader domain (aa 1-93), recognizing H3K36me3 chromatin marks [32][33][34][35][36], and a set of DNA-binding motifs (Fig 1A, [37,38]). Together, these elements allow LEDGF/p75 to explore the chromatin in a dynamic scan-and-lock fashion [39]. Even though its cellular role is not fully understood, it is clear that LEDGF/p75 acts as a molecular hub for a variety of endogenous proteins next to the lentiviral integrase ( Fig 1A) [40,41] , [42] , [43] , [44]. All these proteins, including the lentiviral integrase, bind the C-terminal Integrase-Binding Domain (IBD, aa 347-429; Fig 1A) of LEDGF/p75. We and others showed that replacement of the N-terminal LEDGF/p75 DNA-binding region (aa 1-325) with alternative DNA-binding domains retargets LV integration towards genomic loci bound by these domains [35,[45][46][47]. Fusion of the heterochromatin binding Chromobox protein homolog 1 (CBX1) to the IN-binding C-terminal end of LEDGF/p75 shifted LV integration into the cognate H3K9me x -marked chromatin environment, pericentric heterochromatin and intergenic regions [46]. Despite integration in regions enriched in epigenetic marks associated with gene silencing, transgene expression remained efficient and resulted in successful phenotypic correction in a cell model for X-CGD [48].
Here we aimed at developing a LEDGF-based tether that results in a more random integration pattern to reduce the overall risk of insertional mutagenesis [11,[49][50][51]. First, we truncated LEDGF/p75 by deleting the N-terminal chromatin-reading PWWP domain that binds H3K36me3 marks directing LEDGF/p75 into the body of active transcription units (Fig 1A  and 1B). In addition, we replaced the PWWP-domain with three alternative viral protein domains and motifs, described in literature as pan-chromatin recognition peptides since they bind cellular chromatin without sequence specificity (Fig 1B; Fig 2). Several viruses reside as an episomal DNA genome in host cells, and evolved strategies to persist during mitosis through defined chromatin binding motifs. The spumavirus, Prototype Foamy Virus (PFV), contains a 13-amino acid motif in the group-specific antigen (Gag) binding the H2A/H2B core nucleosome [52][53][54]. Likewise, the Kaposi Sarcoma-associated Herpes Virus (KSHV) genome is tethered to the nucleosomal core via a chromatin binding sequence (CBS) at the N-terminal end of the latency-associated nuclear antigen protein (LANA) [55]. Finally, in the Beta-Papillomaviruses (PV) a conserved motif in the E2 hinge promotes binding to chromatin and mitotic chromosomes of the invaded cell [56][57][58]. Following the generation of stable cells lines, we gene and a LEDGF specific miRNA-based shRNA was described earlier [60] and used to generate stable LEDGF KD cells. All LEDGF/p75 hybrid expression constructs were cloned into the pGAE backbone and cloning steps sequence verified. All cloning steps were confirmed by restriction digest and sequencing.

Retroviral vector production (SIV-based) and transduction
Lentiviral vector production was performed as described earlier [61]. Briefly, for the generation of vesicular stomatitis virus glycoprotein (VSV-G) pseudo-typed SIV-based lentiviral vectors, HEK 293T cells were transfected with the packaging plasmid specific for SIV (pAd_SIV3+; gift from D. Nègre, Lyon, France), the envelope plasmid encoding VSV-G (pLP-VSVG #646 B, from Invitrogen) and respective transfer plasmids, using polyethylenimine (PEI; Polysciences, Amsterdam, The Netherlands). After collecting the supernatant, the medium was filtered using a 0.45 μm filter (Corning Inc., Seneffe, Belgium) and concentrated using a Vivaspin 15 50,000 MW column (Vivascience, Bornem, Belgium). The vector containing concentrate was then aliquoted per 50 μl and stored at -80°C. Stable cell lines expressing a LEDGF hybrid were generated by transduction of polyclonal LEDGF/p75 KD cells with SIV-based vectors and subsequent selection with 0,0003% w/v blasticidin (Invitrogen). For lentiviral transduction experiments (LV eGFP T2A fLuc) cells were transduced ON. 72 hours post-transduction cells were harvested when 90% confluent and used for eGFP FACS-analysis or luciferase activity. The remainder of the transduced cells was further cultivated for at least 20 days to eliminate nonintegrated DNA and submitted for integration site sequencing.

Luciferase assay
Cells were transduced with LV eGFP T2A fLuc and lysed with 70 μl of lysis buffer (50 mmol/l Tris pH 7.5, 200 mmol/l NaCl, 0.2% NP40, 10% glycerol). FLuc activity was determined using the ONE-glo luciferase assay system according to the manufacturers protocol (Promega, Leiden, The Netherlands) and normalized to the total protein concentration in order to correct for differences in metabolic state. The total protein concentration was measured in parallel using a bicinchoninic acid (BCA) protein assay (Pierce, Aalst, Belgium).

Flow cytometric analysis
Cells were transduced with LV eGFP T2A fLuc and harvested when 95% confluent. eGFP/YFP fluorescence was monitored by Flow cytometric analysis (FACS, Fluorescence activated cell sorting) using a FACSCalibur flow cytometer (BD Biosciences, Erembodegem, Belgium). Data analysis was performed with the CellQuest Pro software (BD Biosciences, Erembodegem-Aalst, Belgium). The percentage of eGFP positive cells (% of gated cells) multiplied by the mean fluorescence intensity (MFI) is further referred to as overall transduction efficiency.

Integration site amplification and sequencing
Transduced HeLaP4 cells were further cultivated for at least 20 days to eliminate non-integrated DNA. Cells were harvested when ca. 90% confluent. Genomic DNA was extracted using the GenElute Mammalian Genomic DNA miniprep kit (Sigma-Aldrich). Integration sites were amplified by linker-mediated PCR as described previously [30]. Genomic DNA was digested using MseI and linkers were ligated (S1 Fig). Proviral-host junctions were amplified by nested PCR using Barcoded primers, generating 454 libraries. This enabled pooling of PCR products into one sequencing reaction. Products were gel-purified and sequenced using 454/Roche pyrosequencing (Titanium technology, Roche) on the 454 GS-FLX-instrument at the University of Pennsylvania. Reads were filtered based on perfect match to the LTR linker, Barcode and flanking LTR. All sites were mapped to the human genome requiring a perfect match within 3bp of the LTR end. Three random control sites were computationally generated and matched with respect to the distance to the nearest MseI Cleavage site for each experimental site (matched random control, MRC). A more detailed explanation can be found in the supplementary guidelines of [63]. Normalization of experimental HIV-derived lentiviral vector sites to those of the MRC sites functions as a control for recovery bias due to cleavage by restriction enzymes. Analysis was performed as described previously and genomic heat maps generated using the INSIPID software (Bushman Lab, University of Pennsylvania). [30]. A detailed guide to interpret the heat maps presented can be found in [63]. The computation of DNase I site density was based on a table of DNase I sites obtained from [64]. Datasets used in the safe harbor analysis were retrieved from ENSEMBLE and/or UCSC (TxDB knownGenes, miRNA biotype, UCR; hg19) using Bio-MART [65]. The Allonco-list was used for oncogenes as published in [66]

LEDGF-hybrids locate to the nucleus and display a distinct subnuclear distribution
In a first step, we evaluated the subcellular localization of the truncated ΔN 93 -LEDGF and the respective ΔN 93 -LEDGF-hybrids by immunocytochemistry (Fig 3). Complementation of LEDGF-depleted HeLaP4 cells (LEDGF KD ) with LEDGF BC resulted in a typical pattern of dense, fine speckles in the nucleoplasm excluded from the nucleoli during interphase (Fig 3C), phenocopying the endogenous LEDGF/p75 pattern (Fig 3A), which is in line with earlier reports [46]. Contrary, LEDGF/p75 lacking the chromatin-reading PWWP-domain exhibited a more diffuse nuclear distribution and located to the nucleoli as well (ΔN 93 -LEDGF, Fig 3D). In addition, all ΔN 93 -LEDGF peptide-fusions located to the nucleus (Fig 3E-3H), displaying a unique sub-nuclear distribution: the PFV Gag 534-546 -and the KSHV LANA 1-31 -fusion to ΔN 93 -LEDGF showed a punctate appearance in the nucleus and were excluded from nucleoli ( Fig 3E and 3H), contrary to both HPV5 E2 242-257 -and HPV8 E2 240-255 -ΔN 93 -LEDGF fusions that were enriched in the nucleoli (Fig 3F and 3G). Similar subcellular distributions were observed for the respective cognate LEDGF D366N -hybrids (data not shown).

LEDGF-peptide fusions efficiently redistribute lentiviral integration
After showing that complementation of LEDGF/p75-depleted cells with ΔN 93 -LEDGF or any of the LEDGF-hybrids rescued vector integration, we determined the integration profiles in the respective cell lines. HIV-based viral vector integration sites were amplified and sequenced as described earlier [30,46], yielding a total of 62670 unique integration sites and their computationally generated matched random control (MRC) sites. Note that SIV-based viral vectors were used to complement LEDGF/p75-depleted cells, in order to avoid interference with the HIV-based viral vector integration site amplification and analysis. First, we analysed integration relative to a set of defined genomic features (Fig 5, Fig 6). Lentiviral vector integration in wild-type HeLaP4 cells (endogenous LEDGF/p75, Fig 6) is traditionally enriched in the body of transcription units (75.0% in RefSeq genes; Fig 5) but disfavoured transcription start sites (TSS) and promoter regions (2.0% within 2kb of the 5' of a RefSeq gene and 3.1% within 2kb of a CpG island) [10,25]. LEDGF-depletion results in a more random integration site distribution, characterized by reduced integration into genes (51.0% in RefSeq genes) and increased integration close to TSS (5.4%) and CpG islands (7.0%), in line with previous work [30,46,70]. This phenotype was fully reverted upon LEDGF/p75 complementation (LEDGF BC ; 75.6% in RefSeq genes). Comparable data were obtained for larger window sizes (only 2kb and 4kb are shown in Fig 5). Integration site distributions in cells expressing the respective LEDGF D366N mutants were not different from LEDGF KD cells (n = 16473; data not shown). Interestingly, the mere ablation of the PWWP domain (ΔN 93 -LEDGF) resulted in an overall more random distribution compared to LEDGF KD cells, with decreased integration near retrovirus-specific features like gene bodies, TSS and promoter regions ( ÃÃÃ p<0.001; χ2 test compared to LEDGF KD ; Fig  5). Complementation of LEDGF-depleted cells with LEDGF-peptide fusions resulted in a comparable more randomized distribution ( ÃÃÃ p<0.001; χ2 test compared to LEDGF KD ; Fig 5). In Safer, Random Lentiviral Vector Integration a more elaborate analysis, we analysed global integration preferences and included a wide selection of genomic features, depicted as a genomic heatmap (Fig 6), comparing integration site data sets obtained from HeLaP4 LEDGF KD cells to those of cells complemented with the respective LEDGF-hybrids. Tile color depicts the correlation for an integration dataset with the respective genomic feature (left) relative to matched random controls, as indicated by the colored receiver operating characteristic (ROC) curve area scale at the bottom of the panel. LEDGF/p75 depletion shifts integration out of transcriptionally active regions wich is reverted upon complementation with LEDGF/p75 (compare LEDGF KD and LEDGF BC ; shown in Fig 6), in line with previous data [30,46,70]. Cells complemented with ΔN 93 -LEDGF displayed an more randomly distributed integration profile, with tiles overall coloring less red or blue compared to LEDGF KD , integrating less near DNase sensitive regions, CpG-islands and GC-rich regions compared to LEDGF KD ( ÃÃÃ p<0.001, Wald statistics). Introduction of the heterologous HPV E2 and LANA 1-31 -peptide fragments to replace the PWWP-domain resulted in a ΔN 93 -LEDGF-like integration profile when compared to LEDGF KD (p<0.001), whereas integration for PFV Gag 534-546 -ΔN 93 -LEDGF was less random. When displaying statistics relative to   Safer, Random Lentiviral Vector Integration integration relative to genomic features, we also analyzed integration site densities near epigenetic features (Fig 6B). The epigenetic heat map displays yellow and blue tiles, with blue tiles indicating that integration frequency is enriched near these marks relative to MRC, whereas yellow tiles indicate that integration is disfavored compared to MRC. A near random distribution would result in a black tile. As reported previously, lentiviral integration correlates with histone marks associated with open and transcriptionally active chromatin (H3K4 mono-, diand tri methylation, H3K14 and H4 acetylation, as well as acetylation and monomethylation of H3K9/K27/K79, H4K20 and H2BK5, . . .): [8] while disfavoring integration in transcriptionally silent regions or heterochromatin (H3K27me3, H3K9me3 or H4K20me3 and H3K79, respectively): [8] (WT; Fig 6). Depletion of LEDGF/p75 (LEDGF KD ) resulted in a more random distribution (with tiles displaying a less pronounced blue or yellow color, and shifting towards black). This tendency was more outspoken for ΔN 93 -LEDGF and the HPV E2 and LANA 1-31peptide fusions compared to LEDGF KD (Fig 6B),

Artificial peptide-LEDGF/p75 hybrids result in a safer integration profile
Together, the presented above data indicate that lentiviral vector integration preferences are defined by LEDGF/p75 as a cellular tether, and are mostly dictated by the N-terminal PWWPdomain. The mere deletion of this domain, or replacement with alternative chromatin-interacting modules redistributes vector integration sites in a more random fashion. The question remains whether redistribution of proviral integration sites obtained for our LEDGF-hybrids also translated in a safer therapy, with a lower chance on insertional mutagenesis. In an effort to get a better view on the safety profile, we calculated integration frequencies near a specific set of previously defined criteria [59,66], such as transcription start sites (<50kb), oncogenes (<300kb) or miRNA coding regions (<300kb), transcription units and ultraconserved elements to define potentially unsafe integration events. The large window sizes impose a very stringent selection for lentiviral integration events away from these features, which in turn can thus be considered as more safe [59]. For each data set we evaluated the percentage of unsafe integrations (Fig 7 and S5 Fig) and in addition determined the percentage of safe sites (events not captured in any of the other criteria; Fig 7, % safe). When calculating the percentage in the parental cell line only 5.4% of all LV integration sites may be considered safe. LEDGF/ p75-depletion results in a shift to 16.3% safe sites (p-value <0.005, Pearsons Chi-square compared to the LEDGF WT control condition), a phenotype that was fully reverted upon LEDGF /   Fig 6. LEDGF-hybrids retarget lentiviral integration towards a more randomized pattern. (a) Genomic heat map comparing integration site data sets obtained from HeLaP4 LEDGF/p75 KD cells overexpressing different artificial LEDGF-hybrids to genomic features. Tile color depicts the correlation for an integration dataset with the respective genomic feature (left) relative to matched random controls, as indicated by the colored receiver operating characteristic (ROC) curve area scale at the bottom of the panel. Statistical significance (asterisks, ***p<0.001 ranked Wald tests) is shown relative to LEDGF KD population (double dash). Columns indicate different data sets, while rows indicate different genomic features analyzed (described in [63]). LANA, Latency associated nuclear antigen; HPV, Human papilloma virus; PFV, Prototype foamy virus; a; LEDGF, Lens epitheliumderived growth factor. (b) Epigenetic heat map comparing integration site data sets obtained from HeLaP4 LEDGF/p75 KD cells overexpressing different artificial LEDGF-hybrids to epigenetic features. Tile color depicting a positive or negative correlation to the respective epigenetic feature (10kb windows), relative to MRC, as indicated by the receiver operating characteristic (ROC) curve area scale at the bottom of the panel. Statistical significance (asterisks, ***p<0.001, ranked Wald tests) is shown relative to LEDGF KD population (dashed). Columns indicate different data sets while rows indicate different epigenetic features analyzed. Included features were limited to those identified in high-throughput studies HeLaP4 and primary CD4+ T-cells. Detailed information on epigenetic marks and their roles can be found in [87,88]. LANA, Latency associated nuclear antigen; HPV, Human papilloma virus; PFV, Prototype foamy virus; a; LEDGF, Lens epithelium-derived growth factor.

Discussion
Integration of retroviral vectors into the host cell genome makes them invaluable tools for gene therapeutic applications where life-long correction is key. Previous reports showed effective gene transfer enabling long-term gene correction (For a review see [71]). However, severe adverse events in these clinical studies (using full-LTR driven gamma-retrovirus vectors) raised serious concerns regarding the safety of gene therapy when using integrating vectors (derived from the family of retroviruses) [14,15]. The yRV preference for integration into enhancer regions and concomitant activation of proto-oncogenes led to malignant transformation of cells and clonal expansion [10][11][12]. Therefore, multiple studies have been triggered to increase the safety of the used retroviral vectors, which include the use of other subtypes (lenti or alpha instead of gammaretroviral), SIN-LTR design [72][73][74][75], tissue specific promoters [76], changing integration properties [45][46][47] and insulator sequences as enhancer and silencer blockers [77]. Meanwhile, lentivirus vectors became the mean of choice when using retroviruses for gene transfer and clinical gene therapy due to their safer integration profile and lower genotoxicity in preclinical models. As such, any successful modification avoiding an increased integration of these vectors into gene coding regions may be relevant for translation into the clinics. Stable integration however will always imply the intrinsic risk of vector-induced genomic perturbation, open reading frame-disruption, leading to loss of function or transcriptional deregulation of neighbouring genes as indicated by the report on SIN-LV affected splicing [78]. In addition, also LV integration may lead to clonal dominance as reported in the beta-thalassemia trial, which could be an indicator of upcoming malignant transformation [19]. Therefore it is important to gain additional mechanistic insights into the molecular mechanism of integration and integration site selection for LVs to be accepted for general therapeutic use. We and others substantially contributed to the elucidation of the role of LEDGF/p75 as a molecular tether of lentiviral vector integration. As a cellular cofactor of lentiviral integration, LEDGF/ p75 orchestrates lentiviral integration preference by binding H3K36me3 in the body of active transcription units via its N-terminal PWWP domain, but it is the vector-encoded integrase that catalyzes the integration reaction. Depletion of LEDGF/p75 by knockdown or knockout strategies shifts lentiviral vector integration out of active genes, yet integration is not completely random [67,79], which at least in part can be explained by residual targeting via HRP-2 [67]. Here we set out to study whether different LEDGF-hybrids could be generated to distribute lentiviral integration sites more randomly. This line of vector development is based on the further increasing interest in new vector platforms displaying a close-to-random insertional profile potentially reducing the probability of proto-oncogene activation lowering the genotoxic potential [51,80,81]. In an effort to achieve a more random integration site distribution, we deleted the specific chromatin-binding PWWP module of LEDGF/p75 (aa 1-93), or we replaced it with alternative pan-chromatin binding modules. In case of LEDGF/p75, it is demonstrated that the PWWP domain recognizes H3K36me3, a chromatin mark that is particularly enriched in the body of active transcription units [32][33][34][35][36]. Complementation of LEDGFdepleted cells with a LEDGF/p75-protein that had its PWWP domain deleted (ΔN 93 -LEDGF) or replaced with alternative chromatin binding modules showed unique subnuclear distributions for each of the constructs, indicating that these deletion of the PWWP domain, or the replacements with any of the other peptides, resulted in a specific redistribution within the nuclear compartment of the artificial LEDGF chimera (Fig 3). The latter phenotype can be attributed to the AT-hook motifs and charged regions present in the N-terminal end of ΔN 93 -LEDGF, together with the specific peptides that replaced the PWWP domain. After working up integration sites, analysis showed that lentiviral integration preferences for most of the constructs resulted in a more random distribution than under LEDGF depleted conditions (genomic and the epigenetic heat map representations; Fig 6A and 6B), except for PFV Gag 534-546 -ΔN 93 -LEDGF. For example, in the latter cells LV integration was still enriched near epigenetic markers for transcriptionally active chromatin, albeit less outspoken than observed with LEDGF WT and LEDGF BC cells (S4B Fig). Interestingly, peptide addition was not required to obtain a more random distribution. Lentiviral integrations in ΔN 93 -LEDGF expressing cells were redistributed in a fairly random manner, with tile colors shifting to grey and black (for the genomic and the epigenetic heat map representations, respectively) indicating that integration frequencies for these features are not enriched nor depleted compared to the matched random integration site distribution. Comparison with LEDGF KD shows that integration is more randomly distributed than under LEDGF depletion ( ÃÃÃ p<0.001, Wald statistics; Fig 6A and  6B). Fusion of short pan-chromatin binding peptides to the truncated ΔN 93 -LEDGF resulted in similar shifts towards a more randomized integration profile. The fact that all peptide fusions display a unique subnuclear location, suggest that their interaction with chromatin is different. Even though the overall integration frequencies are highly similar (considering the genomic and the epigenetic features analyzed), larger integration site datasets (>10e5 sites) would be required to allow more detailed analysis on the specific subsets. In an effort to estimate the effect of the more randomized distribution on safety, we calculated the frequency of integration relative to a set of safe harbor criteria for the individual integration site datasets [59]. This analysis showed that the more random distributions resulted in a lower genotoxic profile with 18-22% of integrations meeting safe harbor criteria for our LEDGF-chimera compared to only 5.4% for cells carrying wild-type LEDGF/p75, all LEDGF-chimera resulted in a safer distributions over the genome. Fully targeted integration towards safe harbor regions like the AAVS1 or CCR5 locus would be the ultimate solution to circumvent insertional mutagenesis [59,66,82]. Several methods for site-directed gene correction have been developed using genetic scissors based on Zinc-finger nucleases, transcription activator like effector nucleases or more recently RNA-guided nucleases (CRISPR/Cas9) (for a review [83]). However, site directed integration would no doubt impair transduction efficiencies. Our approach improves the therapeutic potential of lentiviral vectors by decreasing the risk/benefit ratio, still supporting high transduction efficiencies. The fact that integration can be directed to genomic regions that are not targeted under wild-type conditions nor LEDGF-depleted conditions, indicates that integration in these areas is disfavored due to the absence of a tether, rather than the presence of specific obstacles such as steric hindrance resulting from the condensed chromatin structure. As an alternative to the generation of stable cell lines as employed here, we demonstrated earlier that mRNA-electroporation ensures timely, high-level recombinant protein expression that is sufficient to retarget lentiviral vector integration [48]. When combined with IN mutant lentiviral vectors that selectively bind complementary LEDGF/p75 variants [84], this approach should be broadly applicable to introduce therapeutic or suicide genes for cell therapy, such as genetic modification of patient-specific iPS cells and improve safety of lentiviral vectors. With the occurrence of potential adverse effects being of multi-factorial nature [85] novel therapeutic approaches should be evaluated in relevant functional assays able to predictively assess the cytotoxicity observed in vivo [86], a continuous effort aiming at abolishing the risk of insertional mutagenesis will be required for gene therapy to become a broadly accepted treatment alternative. Genomic heat maps comparing integration site data sets obtained from HeLaP4 LEDGF/p75 KD cells overexpressing different artificial LEDGF-hybrids to genomic features. Tile color depicting the nature of the correlation for an integration dataset with the respective genomic feature (left) relative to matched random controls, as indicated by the colored receiver operating characteristic (ROC) curve area scale at the bottom of the panel. Statistical significance (asterisks, ÃÃÃ p<0.001, ranked Wald tests) is shown relative to (a) ΔN 93 -LEDGF or (b) LEDGF BC , respectively (double dash). Columns show different data sets while rows indicate different genomic features analyzed (described in [63]). LANA, Latency associated nuclear antigen; HPV, Human papilloma virus; PFV, Prototype foamy virus; LEDGF, Lens epithelium-derived growth factor. (EPS)

S4 Fig. LEDGF-hybrids retarget lentiviral integration towards a more randomized pattern.
Epigenetic heat map comparing integration site data sets obtained from HeLaP4 LEDGF/p75 depleted cells overexpressing different artificial LEDGF-hybrids to epigenetic features, generated using the INSIPID software (Bushman Lab, University of Pennsylvania). Tile color depicting a positive or negative correlation to the respective epigenetic feature (10 kb windows), relative to matched random controls, as indicated by the receiver operating characteristic (ROC) curve area scale at the bottom of the panel. Statistical significance (asterisks, ÃÃÃ p<0.001; ranked Wald tests) is shown relative to (a) ΔN 93 -LEDGF or (b) LEDGF BC , respectively (double dash). Significance is reached when p<0.001, compared to MRC. Columns indicate different data sets while rows indicate different epigenetic features analyzed. Included features were limited to those identified in high-throughput studies performed in HeLa and primary CD4+ T-cells. Detailed information on epigenetic marks and their roles can be found in [87,88]. LANA, Latency associated nuclear antigen; HPV, Human papilloma virus; PFV, Prototype foamy virus; a; LEDGF, Lens epithelium-derived growth factor; MRC, matched random control.  [66], miRNA encoding regions, Transcription units and ultra conserved regions) that, when hit, are considered to be unsafe as defined in [59] (Dataset details are described in the MM section). As such these features are used to define safe harbors as regions that fall outside these criteria. Percentages depict the fraction of integrations falling within the corresponding range relative to the criteria. The % integrations negatively associated with these 5 features is used to calculate a safety profile.