The GATA1-HS2 Enhancer Allows Persistent and Position-Independent Expression of a β-globin Transgene

Gene therapy of genetic diseases requires persistent and position-independent expression of a therapeutic transgene. Transcriptional enhancers binding chromatin-remodeling and modifying complexes may play a role in shielding transgenes from repressive chromatin effects. We tested the activity of the HS2 enhancer of the GATA1 gene in protecting the expression of a β-globin minigene delivered by a lentiviral vector in hematopoietic stem/progenitor cells. Gene expression from proviruses carrying GATA1-HS2 in both LTRs was persistent and resistant to silencing at most integration sites in the in vivo progeny of human hematopoietic progenitors and murine long-term repopulating stem cells. The GATA1-HS2-modified vector allowed correction of murine β-thalassemia at low copy number without inducing clonal selection of erythroblastic progenitors. Chromatin immunoprecipitation studies showed that GATA1 and the CBP acetyltransferase bind to GATA1-HS2, significantly increasing CBP-specific histone acetylations at the LTRs and β-globin promoter. Recruitment of CBP by the LTRs thus establishes an open chromatin domain encompassing the entire provirus, and increases the therapeutic efficacy of β-globin gene transfer by reducing expression variegation and epigenetic silencing.

Introduction b-thalassemias are a group of autosomal recessive disorders characterized by reduced or absent synthesis of hemoglobin bchains. Allogeneic bone marrow (BM) transplantation from HLA-matched related donors provides a definitive cure for this disease [1]. Gene therapy, i.e., transplantation of autologous, genetically corrected hematopoietic stem cells (HSCs), is a potential alternative for patients lacking a compatible donor. Gene therapy of b-thalassemia requires efficient transfer and appropriately regulated expression of a b-globin transgene. Lentiviral vectors (LV) carrying a human b-globin gene under the control of its minimal promoter and elements of the ß-globin locus-control region (LCR) transduce repopulating HSCs and allow long-term correction of the murine and human bthalassemia phenotype in pre-clinical models [2,3,4,5,6]. Limited therapeutic efficacy has been recently observed in one patient treated with transduced CD34 + human hematopoietic stem/ progenitor cells (HPCs) [7].
A significant limitation of existing b-globin vectors is variegation and epigenetic silencing of transgene expression, which the activity of the b-globin LCR is not sufficient to overcome [8]. Several attempts have been undertaken to reduce chromatin-mediated silencing, such as the introduction of insulators or scaffold/matrixattachment regions (S/MARs) in the vector backbone. Insertion of a 1.2-kb insulator derived from a chicken a-globin DNase hypersensitive site (cHS4) in the viral long terminal repeats (LTRs) led to complete correction of the thalassemic phenotype in the erythroblastic progeny of transduced human HPCs in vitro and in vivo [4,9]. Insertion of the cHS4 in the LTRs has, however, detrimental effects on the vector titer, which is partially overcome by reducing the insulator size [10,11]. A 573-bp insulator isolated from the sea urchin arylsulfatase locus (ArsI) was reported to protect LV from silencing by maintaining an active chromatin structure in an orientation-dependent manner, though not in all cell types or differentiation stages [12,13]. Another sea urchin chromatin insulator, sns5, was shown to protect a gammaretroviral vector from the negative influence of chromatin in a mouse erythroid cell line [14]. Addition of S/MARs from the human b-interferon or immunoglobulin heavy chain loci, alone or in combination with insulators and enhancers, reduces silencing and provides position-independent transgene expression in hematopoietic cell lines and primary cells [15,16,17]. More recently, a ubiquitously acting chromatin opening element (UCOE) from the human HNRPA2B1-CBX3 locus has been successfully used to minimize the influences of neighboring chromatin on single-copy transgene expression in hematopoietic cells [18,19].
An alternative strategy to protect an integrated expression cassette from the effect of the surrounding chromatin is the use of transcriptional enhancers, such as the metallothionein or a-globin HS40 enhancer or the DNase hypersensitive site II (HS2) of the b-globin LCR, which relieve position effects at integration sites in cultured cells [20,21,22,23]. In the class of erythroid-specific enhancers, the GATA1-HS2, is a 200-bp element from the gene encoding GATA1 [24], a transcriptional activator that operates as a general switching factor for erythroid development [25]. GATA1 binds to the consensus sequence WGATAR within globin LCRs and regulatory regions of erythroid-specific genes and interacts with CBP, a histone acetyltransferase (HAT) that stimulates GATA1 transcriptional activity [26] by inducing an open chromatin configuration. CBP can directly acetylate GATA1 and, consequently, facilitates GATA1 chromatin occupancy [27,28]. The HS2 element contains a high affinity, palindromic GATA1 binding element that mediates positive autoregulation of GATA1 expression [29,30] and binds the transcription factors CP2 [31] and BCL11A [32].
In this study, we tested the ability of GATA1-HS2 in counteracting repressive chromatin effects in the context of LV-delivered transgenes. The element was inserted into the LTRs of a LV expressing a GFP gene under the control of the silencing-prone cytomegalovirus (CMV) promoter or a b-globin minigene under the control of a reduced-size LCR. The activity of these vectors was tested in the progeny of transduced, repopulating murine HSCs and human CD34 + cells. We show that the GATA1-HS2 is able to increase the probability of longterm transgene expression, and to rescue the thalassemic phenotype at low copy number in a murine model of the disease. Chromatin immunoprecipitation (ChIP) studies showed that GATA1 and CBP bind GATA1-HS2-containing LTRs and induce hyperacetylation at specific histone residues throughout the provirus. These results show that an effective approach to shield integrated transgenes from negative chromosomal position effects is represented by the inclusion in LV LTRs of sequences co-targeted by transcription factors and chromatin modifying enzymes.

A strategy to achieve persistent and positionindependent transgene expression in erythroblastic cells
We previously reported that the GATA1-HS2 element cloned in transcriptionally active viral LTR drives erythroid-restricted transgene expression in both MLV-and HIV-derived vectors [33,34]. To test whether GATA1-HS2 may positively influence the long-term transgene expression in erythroblastic cells, a previously described [35] LV expressing GFP from the CMV promoter (CMV-GFP, Figure 1A) was modified by cloning the 200-bp HS2 element in the 39 LTR, to obtain the G-CMV-GFP vector ( Figure 1B). The CMV promoter was chosen because of its high susceptibility to silencing in vitro and in vivo [18]. The activity of the CMV-GFP and G-CMV-GFP vectors was comparatively analyzed following transduction of murine HSCs and human cord blood-derived, CD34 + cells.
To analyze gene expression in the clonal progeny of long-term HSCs, BM from primary transplanted animals (3 different donors for each vector; Table S1) was transplanted in secondary CD45.2 C57BL/6 recipients (n = 8 per group, Table S1). A total of 66 12-day colonies derived from spleen colony-forming unit (CFU-S) cells were isolated and analyzed by PCR for the presence of the provirus, and by cytofluorimetry for Ter119 and GFP expression. The proportion of Ter119 bright cells was comparable in both groups (data not shown). We observed a .2-fold increase in the percentage of GFP + colonies carrying the G-CMV-GFP vector compared to those carrying the CMV-GFP vector (85% vs 40%; P,0.001, Figure 2B). The average VCN detected by qPCR analysis was 1.4 and 1.8 for G-CMV-GFP and CMV-GFP colonies, respectively (not shown). These data indicate that the HS2 element significantly increases the probability of transgene expression in the clonal erythroid progeny of murine repopulating HSCs by reducing stable and/ or variegating position effect.
The GATA1-HS2 element maintains high level of transgene expression in the erythroid progeny of human HPCs in vitro and in vivo We then tested the activity of CMV-GFP and G-CMV-GFP vectors in human cord blood-derived CD34 + cells. To analyze transgene expression at clonal level, CMV-GFP-and G-CMV-GFP-transduced CD34 + cells were cultured in methylcellulose for 14 days, and single colonies were scored for GFP expression by fluorescence microscopy inspection, and analyzed for the presence of vector sequences by PCR. The vast majority (90%) of cells from BFU-Es (erythroid burst-forming units) carrying the G-CMV-GFP vector expressed the transgene compared to a significantly lower proportion (63%) of those harboring the CMV-GFP vector (P,0.005, Figure 3A), while no significant difference was observed in the proportion of GFP + cells from CFU-GM (granulocyte-macrophage colony-forming units) between the two groups (75% vs. 68%, P.0.1, Figure 3B).
Transduced HPCs were then assayed in vivo after transplantation in sub-lethally irradiated NOD-SCID mice (n = 5 and n = 4 for the CMV-GFP and G-CMV-GFP vector). Transduction efficiency ranged from 70 to 80% for both vectors. Eight to 11 weeks after transplantation, the engraftment of SCID-repopulating cells (SRCs) ranged from 8 to 28%, as determined by the proportion of human CD45 + cells in the mouse BM. Staining with antibodies against lineage markers (CD19, CD13, and CD34) showed the presence of human lymphoid, myeloid, and undifferentiated progenitors, indicating multilineage differentiation of SRCs in both mouse groups (data not shown). Since human erythroid differentiation occurs at low level in NOD-SCID chimeras, we tested GFP expression in the erythroblastic progeny of transduced SRCs ex vivo, in clonal cultures preferentially supporting the growth of human progenitors. BFU-E colonies were scored for both GFP expression and the presence of proviral vectors. Overall, 33 out of 40 (83%) of the G-CMV-GFPtransduced colonies expressed the transgene, compared to 33 out of 53 (62%) in the case of CMV-GFP (P,0.05, Figure 3C). The percentage of transduced CFU-GM colonies expressing the transgene was comparable for the two vectors (77% and 75% for CMV-GFP and G-CMV-GFP, respectively; Figure 3D). These data show that that GATA1-HS2 partially protects a CMVdriven transgene from silencing in the erythroid progeny of human HPCs and SRCs.
The GATA1-HS2 element improves correction of the murine b-thalassemia phenotype by low copy number of a b-globin vector We previously showed that the GLOBE LV, expressing a human b-globin gene under the control of the b-globin promoter and a reduced-size LCR ( Figure 1C) is able to correct the murine b-thalassemia phenotype by providing a selective advantage to genetically corrected erythroblasts [5]. We therefore tested the effect of introducing a copy of GATA1-HS2 in both LTRs of the GLOBE vector (G-GLOBE, Figure 1D) on the expression of the b-globin gene in a thalassemic mouse model. We used heterozygous C57BL/6-Hbb th3 knock-out mice (th3/+) [36], which show a pathophysiology comparable to that of patients with severe ß-thalassemia intermedia, that is chronic anemia, anomalies in RBC size and shape, and ineffective erythropoiesis. We transplanted 16 lethally-irradiated CD45.2, th3/+ mice with BM cells from co-isogenic CD45.1 th3/+ donors, transduced with either GLOBE or G-GLOBE (n = 8 per group). Control mice were transplanted with either wild-type cells (WT control, n = 9) or mock-transduced thalassemic cells (th3/+ control, n = 8). Donor chimerism was .90% at 9-12 months after transplantation in all mice. The extent of phenotypic correction was similar in mice transplanted with G-GLOBE and GLOBE-transduced BM: Hb level, ineffective erythropoiesis (evaluated as percentage of Ter119 + cells in BM), hematocrit, RBC and reticulocyte counts were equally improved compared to th3/+ controls ( Figure 4A, 4B and Figure S2). Likewise, expression of human b-globin, as analyzed by FACS analysis and HPLC, was comparable in the two groups (Table S2). However, the average VCN in the G-GLOBE-transduced, Ter119 + cells was significantly lower than in GLOBE-transduced erythroblasts (1.0260.36 vs. 1.7360.40, P,0.05), while no difference was observed in the Ter119 2 , myeloid population between the two groups (0.7060.20 vs. 0.8360.30) ( Figure 4C). No significant difference in the VCN of Ter119 + vs. Ter119 2 cells was observed in th3/+ mice transplanted with HSCs transduced with a GFP-expressing vector (0.8160.34 vs. 0.7860.32; P.0.1, Figure 1E and Figure 4C). These results indicate that G-GLOBE is able to correct the murine thalassemia at an average VCN of 1 without inducing selection of erythroblasts harboring a higher number of proviruses, as observed for the GLOBE vector.
The GATA1-HS2 element prevents silencing of a b-globin transgene in the progeny of long-term repopulating HSCs To test the effect of GATA1-HS2 on b-globin transgene expression in the clonal progeny of long-term repopulating HSCs, BM cells harboring 0.3-1 GLOBE and 0.2-1 G-GLOBE average vector copies per cell were harvested from primary transplanted mice (4 donors per group) 12 months after transplantation and injected in wild-type secondary recipients. We mapped 2 to 15 different predominant proviral integration sites in the total BM of mice in the GLOBE group and 3 to 15 in mice in the G-GLOBE group, as analyzed by LM-PCR and sequencing (Table S1). Individual colonies from 12-day secondary CFU-S were isolated from recipient mice and analyzed for the presence of the provirus and for b-globin expression. qPCR analysis revealed that all transduced colonies harbored a single proviral copy. Human bglobin synthesis, detected by FACS analysis, was observed in erythroblasts from 14 out of 24 G-GLOBE-containing splenic colonies (58%), compared to 10 out of 40 GLOBE-containing colonies (25%), indicating that GATA1-HS2 significantly increases (P,0.005) the probability of transgene expression in the progeny of long-term HSCs ( Figure 5A). Interestingly, this effect was not apparent in primary colonies from 12-day CFU-S: 10 out of 12 (83%) colonies harboring GLOBE and 15 out of 17 (88%) colonies harboring G-GLOBE expressed human b-globin (P.0.1) ( Figure 5B). The average VCN was 1.6 for GLOBE-transduced and 1.4 for G-GLOBE-transduced primary colonies.
Proviral integration sites in secondary CFU-S-derived colonies were individually sequenced and mapped by LM-PCR. 5 out of 11 GLOBE and 10 out of 12 G-GLOBE integrations were associated to transgene expression, while the remaining proviruses were always silent (Table S3 and S4), showing that a G-GLOBE provirus is less dependent on its genomic position than a GLOBE provirus for transgene expression (45% vs. 83% of active proviruses, Figure 5C). Indeed, some integrations (3 G-GLOBE and 2 GLOBE), independently identified in different colonies, were variably active (Table S3 and S4), indicating that position effect variegation does occur in the progeny of long-term repopulating HSCs. The number of these events was too small to establish whether variegation is reduced by the presence of GATA1-HS2 in the LTRs.

The integration pattern of the G-GLOBE vector is not biased by the presence of the GATA1-HS2 element in the LTRs
We previously showed that the introduction of the MLV enhancer in the HIV LTR causes subtle changes in the integration pattern of LV [37]. To test whether the presence of the GATA1-HS2 element could bias LV integration, we analyzed the integration site distribution of the CMV-GFP and G-CMV-GFP vectors in the genome of human CD34 + HPCs. A total of 438 CMV-GFP and 305 G-CMV-GFP integration sites were mapped by LM-PCR, annotated as TSS-proximal, intragenic and intergenic, and their relative distribution compared to that of 9,974 random control sequences ( Figure S3A). Both vectors integrate preferentially into genes (71.2% and 69.5% for CMV-GFP and G-CMV-GFP, respectively, compared to 36.1% in the control sequences, Figure S3A) with no preference for TSSs ( Figure S3B). To correlate vector integration with gene activity, we determined the gene expression profile of CD34 + cells at the time of transduction by Affymetrix microarray analysis [6], and analyzed the expression level of the genes targeted by the vectors. As expected [38], both LV targeted preferentially active genes compared to random controls ( Figure S4). Finally, we correlated vector integration sites and epigenetic modifications obtained from previously published ChIP-sequencing data sets [39]. The 61-kb regions flanking CMV-GFP and G-CMV-GFP integration sites were equally enriched in PolII binding sites and transcriptionassociated H3K4me1, H3K36me3, H4K20me1, H3K9me1, and H3K27me1 histone modifications compared to random sequences, with no differences between CMV-GFP and G-CMV-GFP ( Figure S5). Similarly, H3K4me3, H3K9me3 and H3K27me3 modifications and binding of the H2A.Z histone variant were under-represented around both LV integrations ( Figure S5). The presence of GATA1-HS2 in the LTR has therefore no apparent influence on the selection of integration sites of the LV. Representative FACS analysis of the BM sub-populations of mice transplanted with CMV-GFP-(mice #6 and #7) and G-CMV-GFP-(mice #14 and #15) transduced cells and sacrificed 12 months after BMT. The cells were stained with an antibody that recognizes the erythroid-specific marker Ter119. The presence of GATA1-HS2 into the LTRs (G-CMV-GFP vector) is able to sustain transgene expression in the Ter119 + BM erythroid compartment. (B) Single erytroid CFU-S were collected from the spleens of secondary recipients 12 days after transplantation with CMV-GFP and G CMV-GFP-containing BM cells (7 and 9 mice, respectively). Donor BM cells are derived from primary transplanted mice which have received CMV-GFP-and G-CMV-GFPtransduced BM cells (3 different donors for each vector). Vector-containing clones carrying a comparable number of proviral copies were analyzed by FACS for transgene expression. The percentages of GFP + (light grey) and GFP 2 (dark grey) vector-positive CFU-S are represented(LV + ). Transgene expression was observed in 40% of CMV-GFP-containing CFU-S, whereas the presence of GATA1-HS2 element into the LTRs was able to sustain GFP expression in 85% of G CMV-GFP-transduced CFU-S. The difference in the proportion of transgene expressing CFU-S between the two groups was statistically significant (P,0.001; Fisher exact test). doi:10.1371/journal.pone.0027955.g002 GATA1-HS2 reduces proviral silencing: exploring the mechanism GATA1 regulates all known erythroid-specific genes, often through the association with the CBP histone acetyltransferase [40]. To investigate whether GATA1 binds to the HS2-containing LTRs and recruits CBP, ChIP experiments were carried out in GATA1-expressing HEL cells transduced with either GLOBE or G-GLOBE LV, using antibodies against GATA1 and CBP. PCR primer pairs were used to amplify sequences corresponding to the 59 LTR by qPCR analysis ( Figures 6A). As expected, GATA1 was highly enriched at GATA1-HS2-containing LTRs ( Figures 6B). Accordingly, CBP showed occupancy above background only in G-GLOBE-transduced cells. GATA1 and CBP occupancy at the endogenous b-globin LCR HS4 element, chosen as an internal control, was similar in GLOBE and G-GLOBE-transduced cells ( Figure S6).
We then measured histone acetylation levels at different regions in the GLOBE and G-GLOBE proviruses in transduced HEL cells. Primers specific for the 59LTR and the b-globin promoter (b-promoter-HS2) were used to amplify ChIP products by qPCR ( Figures 7A). Chromatin was immunoprecipitated using antibodies against pan-acetylated H3 and specific acetylated lysines of H3 and H4 (H3K18, H4K5 and H4K8), which are targets of CBP activity. The G-GLOBE provirus exhibited higher H3 and H4 acetylation levels at the 59LTR compared to the GLOBE provirus ( Figure 7B). This difference is maintained at the b-globin promoter region ( Figure 7C), suggesting the spreading of acetylated histone marks along the chromatin fiber. Histone acetylation levels of the control, transcriptionally active Aldolase A gene were comparable in both GLOBE-and G-GLOBEtransduced cells ( Figure S7). Concomitantly, the H3K4 trimethylation (H3K4me3), typical of active promoters, tended to be higher at the 59 LTR of G-GLOBE compared to GLOBE, whereas two heterochromatin marks (H3K9me3 and H3K27me3) were reduced ( Figure S8A and S8B). H3K4me1, which marks enhancer regions [41], was increased at the GATA1-HS2containing 59LTR ( Figure S8B). Overall, these data indicate that GATA1, together with CBP, maintains an open chromatin structure by increasing the histone acetylation levels in the chromatin domain delimited by the GATA1-HS2-modified LTRs.

Discussion
Gene therapy of thalassemia requires vectors expressing high, persistent and regulated levels of b-globin. The use of b-globin promoters, enhancers, and reduced versions of the LCR allows the expression of potentially therapeutic levels of b-globin in the context of LV, as demonstrated in human thalassemic erythroblasts in vitro [6,9] and in murine models of thalassemia in vivo [2,5,42,43]. LV integrate in different sites in the genome, and although they prefer open and transcribed chromatin regions [44], their expression is susceptible to chromosomal position effects leading to transgene silencing or variegated expression [2,8,45]. The reduced versions of the b-globin LCR are apparently insufficient to overcome position-dependent variability of gene expression in erythroblasts. Chromatin insulators, such as the cHS4 element, reduce only partially positional effects, while affecting both titer and genetic stability of the vectors [7,10,11]. The outcome of the only patient treated so far with a b-globin LV indicates a limited contribution of the vector-derived protein in the The proportion of Ter119 + erythroid cells were not significantly different in th3/+ mice transplanted with GLOBE-and G-GLOBE-transduced HSCs. (C) Lack of selection of erythroid progenitors carrying multiple copies of G-GLOBE. The Ter119 + fraction was sorted from the non-erythroid (Ter119 2 ) sub-population of BM cells from mice transplanted with GLOBE-, G-GLOBE-and LV GFP-transduced cells and VCN was determined by qPCR. The proportion of GLOBE-transduced cells in the Ter119 + fraction was 2.5 fold higher than that observed in the Ter119 2 compartment, indicating an in vivo selective advantage of highly transduced erythroid progenitors [5] (**P,0.01). In marked contrast, no statistically significant difference was observed for mice receiving G-GLOBE-and GFP-transduced BM cells (P.0.1), thus indicating 1 copy of G-GLOBE per cell is sufficient to produce normal Hb levels ( Figure 4A). Interestingly, G GLOBE is able to rescue the thalassemic phenotype with a significantly lower VCN compared to GLOBE (*P,0.05). doi:10.1371/journal.pone.0027955.g004 presence of a low number of engrafted transduced HPCs [7]. Low transduction efficiency and genetic instability play a significant role in this still sub-optimal vector performance.
Transcriptional enhancers can suppress negative chromatin influences on transgene expression [21], [22], and might act more consistently and in a larger variety of chromatin contexts compared to conventional insulators or barrier elements. Enhancers may also recruit a promoter to a nuclear compartment in which transcription is stably heritable through cell generations [22]. GATA1 is a master transcriptional activator in erythroid development, binding to regulatory elements of most erythroid-specific genes, including the globin enhancers and LCRs. GATA1 recruits CBP [26], which induces open chromatin conformations by acetylating histones at specific residues. The GATA1-HS2 element contains a high-affinity, palindromic GATA1-binding element, and binds the transcription factors CP2 [31] and BCL11A [32]. We previously showed that GATA1-HS2 acts as an efficient, erythroid-specific enhancer in the context of both retroviral and lentiviral vectors [33,34]. In this study, we probed the activity of GATA1-HS2 in reducing silencing and position-dependent variability of transgene expression in the context of self-inactivating LV in which GATA1-HS2 was inserted in both LTRs. The modification had no influence on the target site selection of the vector, which maintained the typical lentiviral preferences for active genes, enriched in H3K36me3, H4K20me1, H3K9me1, and H3K27me1 histone modifications. Nevertheless, GATA1-HS2-flanked transgenes showed more persistent and position-independent expression characteristics, indicating an active role of the element in maintaining a permissive chromatin structure in the target cells.
Differentiation of HSCs into erythroblasts is marked by a gradual increase in chromatin condensation, which is associated with gene silencing at most loci. We therefore tested whether GATA1-HS2 could prevent inactivation during erythroid differentiation of  transgenes delivered to HSCs and HPCs by a LV. Introduction of GATA1-HS2 in the viral LTRs effectively protects from silencing an expression cassette driven by a silencing-prone CMV promoter in erythroblasts differentiating from transduced murine HSCs and human CD34 + HPCs in vitro and in transplantation assays in vivo. The positive effect on transgene expression was not observed in the non-erythroblastic lineages, indicating that the presence of the erythroid-specific GATA1 transcription factor is essential for the activity of the GATA1-HS2 element. GATA1 is expressed also in megakaryocytes, eosinophils, and mast cells, which cannot be detected at high frequency in fresh BM cells. In vitro cell culture experiments will be necessary to unravel the potential GATA1-HS2 anti-silencing effect in these lineages.
We then tested the activity of GATA1-HS2 in the LTRs of the GLOBE LV, containing a human b-globin gene under the control of its promoter and a minimal version (HS2+HS3) of the b-globin LCR [5]. The GLOBE vector is able to correct thalassemia in a murine pre-clinical model of gene therapy [5] and in the erythroblastic progeny of human thalassemic HPCs [6]. The therapeutic efficacy is achieved even at low dose of transduced HSCs by in vivo selection of transgene-expressing erythroblastic cells, possibly overcoming the negative influence of chromatin surrounding the LV proviruses. We observed that silencing of the b-globin transgene in the erythroid progeny of human HPCs and murine HSCs transduced with the G-GLOBE vector was significantly reduced compared to cells transduced with the GLOBE vector. As a consequence, the G-GLOBE vector allowed correction of murine thalassemia at a lower copy number and without inducing the in vivo selection observed with the GLOBE vector [5]. Therefore, reducing position-dependent variability of gene expression increases the vector's therapeutic potential, providing disease correction at lower vector copy number. Remarkably, the LTR modification does not impair the viral titer, probably because of the limited size of the GATA1-HS2 (2.0610 9 vs. 1.4610 9 TU/ml for a typical preparation of GLOBE and G-GLOBE, respectively).
The major difference between the two vectors was observed in the progeny of long-term repopulating HSCs, assayed as CFU-S in secondarily transplanted mice, indicating that the GATA1-HS2 provides a more effective long-term protection from silencing compared to to the LCR alone. However, position effect variegation, resulting in expression or silencing in different colonies from CFU-S harboring the same provirus, was observed for both vectors, indicating that GATA1-HS2 increases the probability of expression but is unable to completely prevent silencing in the long-term progeny of transduced HSCs.
In order to understand the molecular basis of the protective effect of GATA1-HS2, we carried out ChIP studies on erythroblastic cells transduced with the GLOBE or G-GLOBE vectors. A typical function of enhancers is to prevent the formation of repressive chromatin structure by controlling histone acetylation and methylation around promoters. Binding of GATA1 to erythroid-specific enhancers during development or differentiation is a key factor in the initiation and maintenance of active chromatin structures [46]. The interaction of GATA1 with CBP/ p300 suggests at least one mechanism by which HATs might be brought to specific sites [26], [47]. By modifying chromatin-bound histones or GATA1 itself [28], acetylases enhance transcriptional activity of erythroid-specific loci [27]. In this context, GATA1 is more an ''architectural'' factor than a simple transcriptional activator. The essential role of GATA1 and CBP for the formation of an erythroid-specific acetylation pattern that is permissive for high levels of gene expression has been reported at the LCR and the murine ß-major globin gene promoter [40,48]. Our study shows that binding of GATA1 and CBP to the GATA1-HS2 element correlates with increased CBP-mediated histone acetylation (H3K18, H4K5, and H4K8) at the level of the LTRs and the internal promoter. Although GATA1 and CBP bind also the LCR elements and the b-globin promoter within the vector, the presence of the GATA1-HS2 element at both sides of the transgene apparently increases CBP-mediated histone acetylation and spreads the acetylated histone domain to a chromatin region that encompasses the entire provirus. This might create a more stable, inheritable active chromatin configuration that renders the integrated transgene resistant to silencing in a large proportion of integration sites.
Our results suggest that GATA1 establishes an active epigenetic state across transgenes flanked by GATA1-HS2 elements in the context of LV. Therefore, this vector design represents a way to shield integrated transgenes from negative chromosomal position effects, improving the therapeutic potential of LV-mediated gene therapy approaches. Further studies addressing the impact of this lineage-specific enhancer on cellular transcriptional activity are necessary to assess the risk-benefit balance for potential clinical use.

Lentiviral vectors
The CMV-GFP lentiviral vector was previously described [35]. To construct the G-CMV-GFP vector, an EcoRI/SacI fragment containing part of the 39 LTR of the lentiviral vector pHR2 [49] was subcloned into the PvuII site of the pUC19 plasmid. The plasmid was digested with EcoRV and PvuII, in order to generate a 2418 to 218 deletion (where +1 is the transcription start site) in the U3 region of the LTR, and ligated with a 200-bp BamH1 human genomic fragment (2856 to 2655) containing the GATA1-HS2 [24] to obtain a chimerical self inactivating (SIN) LTR. The EcoRI-SfiI fragment containing the modified 39 LTR was then introduced into the EcoRI-SfiI sites of the CMV-GFP backbone [50], to obtain the G-CMV-GFP vector. To generate the G-GLOBE vector, the EcoRI-SfiI HS2 SIN LTR fragment was introduced into the SpeI-SfiI sites of the GLOBE vector backbone. Viral vector stocks were prepared by transient co-transfection of HEK293T cells using a three-plasmid system as previously described [5]. HEK293T cells were kindly provided by Luigi Naldini.

Hematological and cell phenotype analysis
Blood samples were collected from transplanted mice and Hb concentration, hematocrit, RBC and reticulocyte counts were measured as described [5]. RBC and CFU-S were stained with PE-conjugated anti-human b-globin antibody as described [5,51]. Mouse BM and spleen cell phenotyping was carried out using PEconjugated anti-mouse TER-119 antibody. PE-conjugated antimouse CD45.1 and PerCP-conjugated anti-mouse CD45.2 antibodies were used to evaluate donor-host chimerism. Human cell surface phenotype and human cell engraftment in transplanted NOD/SCID mice were determined as described [33]. Unbound antibodies were removed by a final wash with PBS 1% FBS and cells were analyzed using a FACScalibur cytofluorimeter (Becton Dickinson, Mountain View, CA). Ter119 + BM cells were purified as previously described [5]. Antibodies are listed in the Supplemental Materials and Methods (Text S1).

DNA analysis
Genomic DNA was isolated with the QIAGEN QIAmp DNA mini kit. GFP-specific primers were used to detect the presence of CMV-GFP and G-CMV-GFP vectors. Primers that anneal to the human HS2-HS3 LCR sequence were used to detect GLOBE and G-GLOBE vectors. Primer sequences are listed in Supplemental Materials and Methods (Text S1). The average vector copy number (VCN) was measured by qPCR as previously described [5].

Integration site analysis
Integration sites were cloned by linker-mediated PCR (LM-PCR) as previously described [38]. Briefly, genomic DNA was extracted from human HPCs, murine BM cells or CFU-S, digested with MseI and ligated to an MseI double-stranded linker. LM-PCR was performed with nested primers specific for the linker and the 39 HIV LTR [38]. PCR products were shotgun-cloned into libraries of integration junctions, which were then sequenced to saturation. Sequences were mapped on the human (UCSC Human Genome Project Working Draft, hg18) or mouse genome (UCSC Mouse Genome Project Working Draft, mm9) by the BLAT genome browser. A matched random integration dataset was created for the human genome as described [38]. We annotated all genes having their transcription start site (TSS) within 50 kb from each integration/random site in either directions. Integrations were annotated as TSS-proximal when occurring within 62.5 kb from a TSS, intragenic when occurring inside a gene .2.5 kb from a TSS, and intergenic in all other cases. In case of multiple transcript variants, we chose the longest isoform.

ChIP assay
ChIP assay was performed as described [44]. Samples were quantified using real-time SYBR Green PCR and analyzed using an Applied Biosystems 7900HT system. Serial dilution of total input chromatin was used to generate a standard curve for each primer and sample set. Primers sequences and antibodies are listed in Supplemental Materials and Methods (Text S1).

Statistical analysis
We used a 2-tailed Student's t test to determine whether hematological parameters, fractions of inputs and VCN differed between groups. Fisher's exact test was used to assess whether the difference between two proportions was significant. The statistical analyses were performed using GraphPad Prism Version 4.0b (GraphPad Software Inc., La Jolla, CA). Figure S1 Increased transgene expression in the Ter119 + compartment of mice receiving G-CMV-GFP-transduced HSCs. GFP expression was analyzed by FACS in the erythroid Ter119 + fraction of BM from mice transplanted with CMV-GFPand G-CMV-GFP-transduced HSCs, as described in the legend to  Figure S4 Correlation between CMV-GFP and G-CMV-GFP integration sites and gene activity. The bars show the percentage of distribution of expression values from Affymetrix HG-U133A microarrays of CD34 + cells at the time of transduction. Target genes are divided in four expression level categories: absent (black), low (below the 25 th percentile in a normalized distribution; blue), intermediate (between the 25th and the 75th percentiles; yellow) and high (above the 75th percentile; red). The first bar (Chip CD34 + ) shows the distribution of the genes on the microarray of CD34 + cells at the time of transduction, the other three bars represent the expression values of genes at control random sites (Random) or genes targeted by CMV-GFP and G-CMV-GFP integrations. n represents the number of genes. (EPS) Figure S5 Association between histone modifications and CMV-GFP and G-CMV-GFP integrations sites. Total number of base pairs carrying the indicated histone modification/ bound protein on each strand at 61 kb from each CMV-GFP (green), G-CMV-GFP (blue) and random (grey box) insertion site. The distributions of the entire datasets are represented by box plots. Data were retrieved from ChIP-sequencing experiments performed in the genome of human CD34 + /CD133 + HPCs [39]. Two-sample test statistics among CMV-GFP, G-CMV-GFP and random distributions for each modification indicate that comparisons between CMV-GFP and G-CMV-GFP are not statistically significant (non parametric Mann-Whitney U test, not shown). All comparisons between LV and random distributions are statistically significant (*P,0.05; ****P,0.001; Random: n = 9974; CMV-GFP: n = 438; G-CMV-GFP: n = 305).