A Novel Co-Crystal Structure Affords the Design of Gain-of-Function Lentiviral Integrase Mutants in the Presence of Modified PSIP1/LEDGF/p75

Lens epithelium derived growth factor (LEDGF), also known as PC4 and SFRS1 interacting protein 1 (PSIP1) and transcriptional co-activator p75, is the cellular binding partner of lentiviral integrase (IN) proteins. LEDGF accounts for the characteristic propensity of Lentivirus to integrate within active transcription units and is required for efficient viral replication. We now present a crystal structure containing the N-terminal and catalytic core domains (NTD and CCD) of HIV-2 IN in complex with the IN binding domain (IBD) of LEDGF. The structure extends the known IN–LEDGF interface, elucidating primarily charge–charge interactions between the NTD of IN and the IBD. A constellation of acidic residues on the NTD is characteristic of lentiviral INs, and mutations of the positively charged residues on the IBD severely affect interaction with all lentiviral INs tested. We show that the novel NTD–IBD contacts are critical for stimulation of concerted lentiviral DNA integration by LEDGF in vitro and for its function during the early steps of HIV-1 replication. Furthermore, the new structural details enabled us to engineer a mutant of HIV-1 IN that primarily functions only when presented with a complementary LEDGF mutant. These findings provide structural basis for the high affinity lentiviral IN–LEDGF interaction and pave the way for development of LEDGF-based targeting technologies for gene therapy.


Introduction
Integration of reverse transcribed viral cDNA into the host cell genome is an essential step in the retroviral life cycle. This process is catalyzed by integrase (IN), a virus-derived enzyme, which carries out two separate reactions acting on both cDNA termini (reviewed in [1,2]). Firstly, 39-processing takes place in the cytoplasm of the host cell, in which a di-or trinucleotide is hydrolytically removed from each cDNA end, exposing 39hydroxyl groups of invariant CA dinucleotides. The enzyme remains attached to both viral cDNA ends within a higher order pre-integration complex (PIC). The PIC is transported into the nucleus and, upon locating a suitable chromatin environment, the second reaction, strand transfer, ensues. During this step, the pair of hydroxyl groups produced during 39-processing nick and join to opposing strands of the cellular DNA, four to six base pairs apart, depending on the retroviral genus. To complete the process, cellular enzymes repair the integration site, resulting in a stable provirus flanked by short duplications of the target DNA sequence.
Retroviral INs share a conserved three domain organization, each containing a central catalytic core domain (CCD), flanked by N-and C-terminal domains (NTD and CTD) [3][4][5]. The CCD spans the most conserved region of IN and bears close structural homology to prokaryotic transposases [6]. The enzyme active site is comprised of three invariant acidic residues (the D,DX 35 E motif) that coordinate a pair of Mg 2+ cations during catalysis [7,8]. The NTD forms a three-helical bundle, which folds around a zinc atom coordinated by His and Cys residues of an HHCC motif [9,10]. The CTD features an SH3-like fold, is rich in basic residues and is likely involved in DNA binding [11,12]. Despite Herculean efforts directed towards characterization of this key antiviral drug target, the structure of a full-length retroviral IN remains elusive. The active form of retroviral IN is a tetramer [13][14][15], and a plausible tetramer model for the apoenzyme was proposed based on a crystal structure of a two-domain fragment of HIV-1 IN containing its NTD and CCD (IN NTD+CCD ) [16].
Lentiviral DNA integration critically depends on lens epithelium-derived growth factor (LEDGF) (reviewed in [17]). LEDGF tightly associates with chromatin and has been implicated in regulation of cellular gene expression, epigenetic chromatin modifications and apoptosis [18][19][20]. The host factor directly binds HIV-1, HIV-2, as well as other lentiviral INs and dramatically stimulates their strand transfer activity [21][22][23][24]. LEDGF tethers lentiviral IN to host chromatin in the nucleus [24][25][26][27] and plays a critical role in directing PICs to active genes during integration [28][29][30][31][32]. LEDGF contains a pair of small structural domains: an ,92 residue PWWP domain at its Nterminus, responsible for binding to an as yet unidentified component of chromatin, and the IN binding domain (IBD, residues 347-429) within its C-terminal portion [33][34][35]. The CCD and NTD of IN were both implicated in LEDGF binding: while the CCD is minimally sufficient, the NTD is required for high affinity binding [27,36]. Deletion of the HIV-1 IN NTD, or a mutation destabilizing zinc coordination within this domain (His-12 to Asn), greatly reduced the interaction with LEDGF [27]. A close homolog of LEDGF, hepatoma derived growth factorrelated protein 2 (HRP2), contains conserved PWWP and IBD-like domains. Although HRP2 is able to interact with HIV-1 IN and stimulate its enzymatic activity in vitro [33], it remains to be established whether it plays a role in lentiviral integration.
The structure of the LEDGF IBD, composed of a pair of ahelical hairpins, has been determined both separately and in complex with the HIV-1 IN CCD [34,36]. In the IN CCD :LEDG-F IBD complex, Ile-365 of LEDGF inserts into a hydrophobic pocket at the IN dimer interface. The interaction is further bolstered by additional hydrophobic interactions between IN residue Trp-131 and LEDGF Phe-406 and Val-408. Asp-366 of LEDGF plays an important role in protein-protein recognition, forming a pair of essential hydrogen bonds with the main chain amides of IN residues Glu-170 and His-171 [36]. Mutation of LEDGF Asp-366 to Asn ablated the interaction with all lentiviral INs tested so far, indicating a common mechanism of recognition [22]. In this work we extend the known lentiviral IN-LEDGF interface to include contacts between the IN NTD and the IBD of LEDGF. This part of the protein-protein interface is essential for high affinity binding and stimulation of concerted DNA integration, and allows designs of complementary pairs of IN and LEDGF mutants for practical uses in gene therapy.

Crystallization and Structure Determination
To further characterize the interface between lentiviral INs and LEDGF, we obtained a complex of HIV-2 IN NTD+CCD and LEDGF IBD by co-expression, and crystallized it in two forms.
Although crystal form II diffracted to slightly lower resolution than form I (Table 1), it resulted in a higher quality structure. Firstly, form I displayed a higher degree of disorder and only three quarters of the asymmetric unit (ASU) could be unambiguously defined in electron density maps. Secondly, the twelve-fold noncrystallographic symmetry (NCS) in form II dramatically increased the observations:parameters ratio, resulting in a pseudo-high resolution structure. The structures observed in both crystal forms were overall equivalent, and the remainder of the paper will focus on form II. Snapshots of electron density for the CCD-IBD interface and Zn-His 2 Cys 2 cluster of the NTD are shown in Figure  S1A and S1B. Of note, all previous HIV-1 IN CCD crystal structures required Phe-185 to be mutated to Lys or His to improve protein solubility. Such a mutation was not necessary to crystallize the HIV-2 IN NTD+CCD :LEDGF IBD complex. Therefore, our structure is the first to include an HIV IN CCD with a Phe residue naturally occurring at this position ( Figure S1C).
Based on elution from calibrated gel filtration columns and velocity analytical ultracentrifugation experiments, the purified HIV-2 IN NTD+CCD :LEDGF IBD complex behaved as a monodisperse species with a calculated molecular mass of ,60 kDa (data not shown). This size is consistent with a dimer of HIV-2 IN NTD+CCD plus one or two LEDGF IBD molecules, closely matching the basic building unit observed in the crystal ( Figure 1A, referred to as the IN 2 LEDGF substructure). The substructure assembles further into closed trimers with a three-fold NCS ( Figure 1B), and four such trimers accrue in the ASU to form a spherical structure containing 24 IN and 12 LEDGF chains ( Figure 1C). The trimer is held primarily via IN-IN (CCD-CCD, NTD-CCD and NTD-NTD) interactions ( Figure 1B), and the total buried surface area between neighboring IN dimers is ,2,000 Å 2 .

The IN-LEDGF Interface and the Role of the IN NTD
A total of ,1,450 Å 2 of molecular surface is buried at the IN-LEDGF interface within the IN 2 LEDGF substructure. HIV-1 and HIV-2 INs share ,60% amino acid sequence identity over the span of their CCDs. Accordingly, the contacts between the HIV-2 IN CCD and the IBD are very similar to those observed in the HIV-1 IN CCD :LEDGF IBD structure, and have been extensively discussed elsewhere [36]. Significant changes to this part of the interface occur due to amino acid replacement at positions 128 and 129: HIV-2 encodes Met and Val, respectively, while HIV-1 carries Ala in both cases. The HIV-2 residues are nevertheless involved in similar hydrophobic interactions: the Met-128 side chain packs against Leu-368, Phe-406, and Val-408 of LEDGF, while Val-129 contributes to the hydrophobic pocket that buries LEDGF residue Ile-365 ( Figure S2). As predicted [22,36], the critical LEDGF Asp-366 residue forms a bidentate hydrogen bond to the same backbone amides in HIV-2 and HIV-1 INs, even though the side chains at these positions differ between viruses (Asn-170 and Thr-171 in HIV-2; Glu-170 and His-171 in HIV-1).
In agreement with prior biochemical analyses [27], the NTD of IN makes extensive contacts with LEDGF. A constellation of acidic residues on the first helix (a1) of the NTD (Glu-6, Glu-10, and Glu-13) faces positively charged residues on the a4 helix of the IBD (Lys-401, Lys-402, Arg-404, and Arg-405). Side-chains of LEDGF residues Lys-401, Arg-404, and Arg-405 are well ordered, and a well-defined salt bridge involves IN residue Glu-10 and Arg-405 of LEDGF ( Figure 2A). The remaining side chains show varying degrees of order and appear to contribute to the overall charges of the interacting faces. The closely positioned and highly conserved IN residue Glu-11 is not involved in the interface and instead interacts with Lys-25 and Lys-186 of the same IN chain, supporting NTD structural integrity and hence overall stability of the IN 2 LEDGF substructure.

Author Summary
Retroviruses crucially rely on insertion of their genomes into a host cell chromosome, and this process is carried out by the viral enzyme integrase. HIV and other lentiviruses also depend on LEDGF, a cellular chromatinassociated protein, which binds their integrase proteins and tethers them to a human chromosome. The interaction between integrase and LEDGF can potentially be exploited for directing integration of lentiviral vectors in gene therapy applications, as well as for development of antiretroviral drugs. Herein, we present a three-dimensional structure of a protein-protein complex containing a fragment of HIV integrase and the integrase-binding domain of LEDGF. Our structure elucidates the hitherto unknown LEDGF-integrase interface involving the amino terminal portion of the viral enzyme. Using a range of complementary approaches, we further show that these novel protein-protein contacts are essential for the function of LEDGF in HIV integration. The novel structural details will be very useful for the development of HIV inhibitors that target the integrase-LEDGF interaction. Furthermore, they enabled us to design a mutant of HIV integrase that depends on a reverse-engineered mutant of LEDGF, providing an inroad to the design of LEDGF-based lentiviral vector targeting strategies.  Acidic residues at IN positions 6, 10, and 13 are highly conserved among HIV isolates, and those at positions 10 and 13 tend to be acidic within Lentivirus, whose members retain at least one of the two negative charges ( Figure 2B). Feline immunodeficiency virus (FIV) and maedi-visna virus (MVV) INs that lack negative charges at positions 10 and 13, respectively, contain additional acidic residues (Glu-7 and/or Glu-9), which should preserve the negative charge of the NTD face. Overall, lentiviral INs maintain two or more acidic residues within a1 of their NTDs, which can be predicted to contribute to the interaction with the positive face of the IBD. Conversely, the complementary basic residues are conserved among all known LEDGF and HRP2 orthologs, with some variation only at the position corresponding to LEDGF Arg-405, where Arg or Lys is accommodated [22,24,33,34]. Of note, although some INs from nonlentiviral genera contain acidic residues at positions corresponding to residues 10 or 13 of lentiviral INs, these residues are not conserved within or among these genera (data not shown).
The pair of NTDs belonging to the IN dimer of the IN 2 LEDGF substructure exist in equivalent orientations with respect to the CCD dimer and are supported by contacts with the CCDs involving three salt bridges (Glu-11:Lys-186, Lys-20:Asp-193, and Glu-21:Arg-188) as well as hydrophobic stacking interactions involving Lys-14 and Tyr-15 of the NTD and Trp-131, Trp-132, and Lys-186 of the CCD. An almost identical NTD-CCD interface was observed in the crystal structure of the uncomplexed HIV-1 IN NTD+CCD tetramer [16] ( Figure S3B, discussed in more detail below). Notably, the interface was formed between a CCD of one IN dimer (green in Figure S3B) and an NTD from another (yellow in Figure S3B), and so the two structures present an interesting case of domain swapping ( Figure S3). The other NTD of the IN 2 LEDGF substructure (cyan in Figure S3A) is important in forming the closed trimers as it interacts with a second IN 2 LEDGF module through its A chain NTD and the IBD ( Figure 1B).

The NTD-IBD Interface Is Critical for the High Affinity IN-LEDGF Interaction and Affords Functional Charge Reversal of Opposing Molecular Faces
The domain-domain interfaces observed in the crystal structure were targeted by mutagenesis to investigate their functional relevance. Three LEDGF mutants were designed to eliminate or  [36], and K392E was made to disrupt a potential interaction between Lys-392 and Glu-6 within the secondary NTD-IBD interface that contributes to substructure trimerization ( Figure 1B). The mutants were tested in a His 6 -tag pull-down assay for binding to the INs from HIV-1, HIV-2, and three nonprimate lentiviruses (bovine immunodeficiency virus [BIV], MVV, and equine infectious anemia virus [EIAV]). Consistent with earlier reports [22,27], wild type (WT) LEDGF was pulled down by all WT lentiviral INs, but not with the HIV-1 H12N mutant ( Figure 3A). His-12 is involved in zinc coordination and is, therefore, critical for structural integrity of the NTD [37]. Also in agreement with prior work [22,34] The yeast two-hybrid technique proved more sensitive than pull down analyses when applied to weak interactions between IN and LEDGF mutants [38]. Full-length HIV-1 IN fused to the DNA binding domain of Gal4 serves as bait, and binding of the LEDGF IBD fused to Gal4 transcription activation domain is reflected by b-galactosidase reporter gene activity. In this assay, LEDGF mutants K360E and K392E showed wild type levels of binding to HIV-1 IN, AAA bound at about 5% of WT, while similarly to D366N, the ESE and EEE mutants failed to interact at detectable levels ( Figure 3B), essentially corroborating the results of the His 6tag pull down experiments. To demonstrate that these observations were not due to off-site effects such as defective folding or reduced expression of the LEDGF mutants and to validate the novel NTD-IBD interface further, the complementary IN residues were mutated, producing a reversed charge D6K/E10K/E13K (KKK) HIV-1 mutant. Impressively, KKK IN robustly interacted  strongly depend on the in vitro reaction conditions, with parameters such as the length of the donor DNA, enzyme source and concentration, and presence of crowding agents greatly affecting the outcomes [22,[39][40][41]. The use of a short mimic of the viral cDNA end (referred to as donor DNA substrate) and supercoiled target DNA conveniently allows discrimination between products of concerted and half-site integration ( Figure 4A).
LEDGF robustly promotes the strand transfer activities of divergent lentiviral INs in vitro, although the fidelity of LEDGFmediated strand transfer varies for the different INs [21,22]. Intriguingly, while simulating half-site strand transfer activity of HIV-1 IN under all reported conditions, the host factor has the capacity of to either inhibit [42] or bolster [43] its concerted integration activity.
In the presence of 10 nM 500-bp donor substrate, 2.9 kb supercoiled plasmid DNA (pGEM), and WT LEDGF, WT HIV-1 IN carries out robust, predominantly half-site strand transfer [22] ( Figure 4B, lane 3); half-site products migrate in agarose gels well above the open circular form of the target, while concerted integration products appear as linear (,4,000 bp) DNA species ( Figure 4A and 4B). In agreement with earlier observations, the reaction was severely affected by the critical LEDGF D366N mutation ( Figure 4B, lane 4). Despite greatly reduced binding affinity, the EEE LEDGF mutant retained the ability to stimulate half-site strand transfer activity, evident from accumulation of both donor-target and donor-donor products (lane 5). Concordantly, both WT and EEE LEDGF proteins stimulated the half-site activity of KK and KKK HIV-1 IN mutants. The KKK mutant, while significantly less active than WT IN, was somewhat more responsive to the mutant form of LEDGF (compare lanes 10 and 11). Based on these observations we conclude that the intact NTD-IBD interface and hence the full affinity of the IN-LEDGF interaction is not required for stimulation of half-site integration. This finding was not entirely unexpected, as HRP2, which binds HIV-1 IN with significantly lower affinity than LEDGF, is able to stimulate half-site integration in vitro to a similar extent [33]. Of note, because the ability of D366N LEDGF to bolster half-site strand transfer is severely repressed [34] ( Figure 4B, lane 4), we argue that the stimulation of HIV-1 IN by LEDGF, and by EEE LEDGF, in particular, is strictly dependent on the direct proteinprotein interaction. Concordantly, histidine and adenine auxotrophic AH109 yeast cells co-transformed with WT IN and EEE LEDGF Gal4 chimeras displayed a very slow growth phenotype on solid media lacking these nutrients, confirming a weak residual interaction (data not shown). In contrast, evidence for an interaction between D366N LEDGF and WT HIV-1 IN was not observed, even under these conditions [22].
In agreement with earlier observations [22], EIAV IN was highly competent for concerted integration in the presence of LEDGF ( Figure 4C, lane 3). Notably, the concerted strand transfer activity of EIAV IN was severely reduced when EEE LEDGF was used (lane 4). At the same time, the trace levels of half-site activity were not significantly affected. These results indicated that the NTD-IBD interface bears a special significance for concerted lentiviral DNA integration. In the course of optimizing HIV-1 IN strand transfer conditions, we discovered that increasing donor DNA concentration greatly enhanced the yield of LEDGFdependent concerted integration products (refer to Text S1 and Figure S4 for validation of the assay). This novel assay afforded a convenient means for studying the affects of mutations on LEDGF and HIV-1 IN function ( Figure 4D). As expected, the D366N LEDGF mutant, severely defective for IN binding, failed to stimulate concerted integration ( Figure 4D, compare lanes 4 and 5). Reaction products formed in the presence of AAA, ESE, and EEE LEDGF mutants show that successive addition of net negative charge at this location decreases the ability of the cofactor to stimulate concerted integration, with hardly any product visible with the EEE mutant ( Figure 4D, lanes 6-8). However, in agreement with the data discussed above ( Figure 4B), these mutants retained the ability to stimulate half-site integration. As expected, LEDGF mutants K360E and K392E displayed WT activity ( Figure 4D, lanes 9 and 10).
Significantly, both KK and KKK IN mutants gained concerted integration activity in the presence of EEE LEDGF. Furthermore, both IN mutants, and most dramatically KKK IN, favored the mutant LEDGF form ( Figure 4D, lanes 13-16). These results confirm that the mutant proteins are properly folded and that the effects observed are due to the modification of the protein-protein interface. They also suggest a possibility to engineer a gain of function HIV-1 IN mutant, active specifically in the presence of a complementary mutant of the host factor.

The NTD-IBD Interface Is Important for LEDGF Cofactor Function During HIV-1 Infection
To test the importance of the IBD-NTD interface in the context of viral replication, we used an established mouse LEDGF knockout model [29]. Although HIV-1 cannot complete its replication cycle in murine cells due to post-integration blocks, its reverse transcription and integration proceed normally and depend on LEDGF [29]. Ledgf-null mouse embryo fibroblasts (MEFs) transfected with a human LEDGF expression vector or its mutant forms were infected with single-round, vesicular stomatitis virus glycoprotein G (VSV-G)-pseudotyped HIV-1 vectors expressing a luciferase reporter gene (HIV-Luc), and the levels of luciferase activity in cell extracts were measured 44 h post infection. The WT and mutant LEDGF proteins were wellexpressed, and endogenous mouse LEDGF protein, as predicted, was not detected in cells transfected with the empty vector ( Figure 5A). WT LEDGF expression increased the level of knockout cell infection five to ten-fold as compared to cells carrying the empty vector. As expected [29], the D366N LEDGF mutant failed to stimulate the basal level of HIV-Luc infection. LEDGF AAA, K360E, and K392E by contrast supported similar levels of HIV-Luc infectivity as WT LEDGF, while its K401E/ K402A/R405E (EAE, similar to ESE) and EEE mutants functioned at ,25% and 10%, respectively ( Figure 5A). An additional LEDGF mutant combining the EEE and K360E mutations, and therefore lacking the Lys-360:Glu-167 IBD-CCD salt bridge (E4, Figure 1B), supported the lowest level of infectivity ( Figure 5A). These results tie in well with the in vitro interaction and activity data, extending the biological significance of the NTD-IBD interface.
Release and infectivity of HIV-1 mutants carrying substitutions at IN positions 6, 10, and 13 were impaired to various extents. The KKK variant was most affected, failing to support any appreciable infectivity under a variety of conditions. This result was not unexpected, as often subtler changes in IN grossly affect various HIV-1 replication steps [44]. The nature of the defects observed with mutant viruses will be elaborated elsewhere. As demonstrated above, the double mutant carrying substitutions at the more conserved NTD acidic positions (E10K/E13K) was able to functionally interact with EEE LEDGF. Although KK HIV-Luc supernatants harvested from transfected 293T cells contained approximately 30% of reverse transcriptase (RT) activity compared to WT, suggesting subtle release or maturation defects, in agreement with the in vitro data, this virus was infectious when presented with LEDGF EAE or EEE ( Figure 5B

Discussion
In this work we extended the known lentiviral IN-LEDGF interface to include the interactions between the NTD of IN and the IBD of LEDGF. Since there is no evidence that the CTD of IN or LEDGF regions outside of the IBD are involved in the interaction, the contacts observed in our crystal structure may very well represent the entire IN-host factor interface. These novel Based on the three-fold symmetry of this assembly, we tentatively speculate that it could reflect packing arrangement of IN molecules within retroviral capsids, which too feature three-fold symmetry [45]. In both crystals the closed trimers further associated into a spherical particle containing twenty four IN chains, with their C-termini projecting inwards and the N-termini outwards. It remains to be determined if the higher order multimers of the IN 2 LEDGF substructure are biologically relevant. Such evidence could come, for example, by observing similar multimers in crystals of a divergent retroviral IN. Although we have not detected analogous large-sized complexes in solution, the calculated concentration of IN within retroviral capsids is very high [46], presenting an environment where it may very well adopt a paracrystalline state.
Mounting experimental evidence suggests that the active form of retroviral IN is a tetramer [13][14][15]. Based on a crystal structure of the HIV-1 IN NTD+CCD fragment, Craigie and colleagues proposed a plausible model for the synaptic IN tetramer (dimer of dimers) [16]. Notably, the positions of the IN NTDs relative to the CCD dimer and the supporting NTD-CCD contacts observed in our HIV-2 IN NTD+CCD :LEDGF IBD complex were seen in the earlier structure, where the NTDs mediated contacts between IN dimers [16]. One significant difference is that in the tetramer model, the NTD occupying the position primed for the interaction with the IBD is donated by the other IN dimer [16] (Figure S3A and S3B). Such domain swapping is quite common, representing one of the mechanisms for homomeric protein-protein interactions [47].
However, it is important to note an ambiguity of the NTD assignments in the HIV-1 IN NTD+CCD structure, which lacked appreciable electron density for the NTD-CCD linkers [16]. Nevertheless, the NTD-CCD interface observed in the two independent structures is almost certainly biologically relevant. If IN dimers do indeed swap their NTDs during tetramerization, the LEDGF binding platform would include a CCD dimer and an NTD from a separate IN dimer ( Figure S3C). Alternatively, LEDGF binding to an IN dimer would lock one NTD in the orientation primed for tetramerization ( Figure S3D). In either case, upon binding, the co-factor would enhance the thermodynamic stability of the tetramer. It is a tetramer of IN that mediates synapsis of a pair of donor DNA molecules [15] and, concordantly, the NTD-IBD contacts uncovered here are specifically required for concerted DNA integration. The model can also explain the surprising ability of LEDGF to inhibit HIV-1 concerted integration under some in vitro conditions [42]. LEDGF binding to a dimer of IN would fixate either one or both NTDs, preventing them from functionally interacting with a second IN dimer ( Figure S3). Thus, the concentration of IN, LEDGF and donor DNA substrate in the reaction mixture, as well as the order of component addition significantly influence the outcome of the reaction (Ref. [43] and data not shown). Of note, one recent study suggested that IN must exist in a lower multimeric state, likely a dimer, before interacting with the donor DNA for proper synaptic complex formation [48]. The enhanced concerted integration assay described herein will be very useful in future biochemical and structural studies of HIV-1 IN. Furthermore, other lentiviral INs, and in particular EIAV IN, carry out very efficient LEDGF-dependent concerted integration utilizing oligonucleotide donor DNA substrates under similar reaction conditions (data not shown). Using complementary approaches we demonstrated that the NTD-IBD interface is important for the functional Lentivirus-host interaction. Compared to the lock-and-key CCD-IBD interface, the contacts involving the NTD are based on charge-charge interactions and therefore lend themselves to complementary reverse-charge engineering. Since LEDGF has been shown to target lentiviral integration to active transcription units [28][29][30], it has been speculated that modified versions of the host factor could be used to control integration site selection. Safety of retroviral vectors would be greatly improved if they could be directed towards specific pre-determined loci and away from protooncogenes [49]. In one recent work, a fusion of the DNA binding domain of bacteriophage l repressor and the IBD of LEDGF targeted HIV-1 integration nearby l operator sequences in vitro [50]. One fundamental problem can thwart practical application of such approaches. When delivered into the cell, the targeting factor, associated with a limited number of chromosomal loci, will have to compete with a vast excess of endogenous LEDGF for the incoming preintegration complex. Knockout or knockdown of endogenous LEDGF would unlikely be a practical or safe solution, especially considering its emerging role in epigenetics [18]. Here we demonstrated that an HIV-1 IN mutant carrying two reverse charge mutations within the NTD gained the ability to functionally interact with a modified version of LEDGF, while remaining basically unresponsive to the WT protein. Although the efficiency of the current system is somewhat modest, our results present a proof of principle that it is possible to engineer a viable complementary pair of IN and LEDGF mutants that could allow future development and practical applications of LEDGF-based lentiviral vector technologies.  [38] by swapping wild type IN and LEDGF fragments with their mutant forms. For virus infectivity assays, mutations were introduced into the env-deficient HIV-1 proviral clone pNLX.Luc(R-) encoding HIV-1 with a gene for firefly luciferase in place of Nef (HIV-Luc) and pIRES2-eGFP-LEDGF, as previously described [29,52]. For production of SUMO protease, a PCR fragment encoding the catalytic core domain of Saccharomyces cerevisiae Ulp1 (residues 403-621) [53] was subcloned into pCPH6P-BIV-IN, replacing the BIV IN coding sequence to give pCPH6P-Ulp1CD. All DNA constructs made in this work were verified by sequencing to avoid inadvertent mutations.

Protein Expression and Purification
To obtain HIV-2 IN NTD+CCD :LEDGF IBD complex for crystallography, Escherichia coli PC2 cells [22] co-transformed with pCDF-HIV2-IN NTD+CCD and pES-IBD-3C and grown in LB medium in the presence of 50 mg/ml kanamycin and 100 mg/ml spectinomycin to an A600 of ,1.0 were supplemented with 50 mM ZnCl 2 and induced with 0.3 mM isopropyl-b-D-thiogalactopyranoside. Following 4 h induction at 22uC, cells were harvested and stored at 280uC. For purification, cells were lysed by sonication in 1 M NaCl, 20 mM imidazole, 0.2 mM PMSF, 50 mM Tris-HCl, pH 7.4, and the protein complex was captured on Ni-NTA agarose (Qiagen). Following extensive washing the protein was eluted in 1 M NaCl, 200 mM imidazole, 50 mM Tris-HCl, pH 7.4. The His 6 -SUMO tag was cleaved by overnight incubation with SUMO protease at 7uC in the presence of 2 mM DTT. The sample was diluted with four volumes of 1 M NaCl, and the released His 6 -SUMO was depleted by absorption onto a 5-ml HisTrap column (GE Healthcare). Residues C-terminal to the IBD were removed by overnight digestion with HRV14 3 C protease in the presence of 10 mM DTT at 7uC. The complex was then purified by chromatography over a Superdex-200 column (GE Healthcare) in 1 M NaCl, 50 mM Tris-HCl, pH 7.4, concentrated to 17 mg/ml, supplemented with 10 mM DTT and 10% glycerol, and flash-frozen in liquid nitrogen.
Non-tagged wild type and mutant HIV-1 IN proteins used in strand transfer assays were produced in PC2 cells transformed with pCPH6P-HIV1-IN or its mutant forms as previously described [22]. The His 6 -tag was removed by digestion with HRV14 3 C protease. Non-tagged EIAV IN and C-terminally His 6 -tagged HIV-1, HIV-2, BIV, EIAV, and MVV IN proteins have been reported [22]. Wild type and mutant LEDGF proteins were made according to [51]. The SUMO protease Ulp1 catalytic domain fragment was produced in PC2 cells transformed with pCPH6P-Ulp1CD and purified as described in [53]. days, while those of form II appeared within a week and grew over several months to ,15061506150 mm. Both types of crystals were cryoprotected in 25% glycerol, 2.6 M sodium acetate, 10 mM MgCl 2 , 0.1 M Bis-Tris propane-HCl, pH 7.0. Diffraction data were collected to a resolution of 3.0 Å (form I) and 3.2 Å (form II) at the European Synchrotron Radiation Facility (ESRF) beamline ID23-1 at 100 K. The data were processed using MOSFLM [54] and SCALA [55] part of the CCP4 project [56].

Crystallization and Structure Determination
Crystal form I belonged to the space group P321 with unit cell parameters a = b = 210.5 Å , c = 162.6 Å , a = b = 90u, and c = 120u. The structure was solved by molecular replacement with MOLREP [57] using three individual search models in the following order. First, three dimers of CCDs (chains A and B from 2b4j) were located, forming a trimer of dimers, followed by a single IBD molecule (chain C from 2b4j) per CCD dimer, and finally the NTDs (chain A residues 1-45 from 1k6y) [16,36]. After rigid body refinement, it became clear that a fourth IN dimer with corresponding IBD molecule was located out of the plane of the original trimer of dimers, and that this new dimer formed a similar trimer of dimers via the crystallographic three-fold axis. The ASU contained twelve protein chains and over 70% solvent. The structure was refined using REFMAC [58] and PHENIX [59], including translation, libration and screw (TLS) refinement, with manual model building in COOT [60].
Form II crystals belonged to the space group P2 1 2 1 2 1 with unit cell parameters a = 201.4 Å , b = 202.5 Å , c = 280.5 Å , and a = b = c = 90u. A high degree of NCS was expected due to the large unit cell. Therefore, to reduce potential bias in R free estimation, the test reflection set was chosen in thin shells using SFTOOLS, part of the CCP4 program suite [56]. The structure was solved by molecular replacement in PHASER [61], using a search model containing the dimeric IN assembly plus an associated LEDGF chain as observed in form I. The structure was refined using simulated annealing in PHENIX and restrained refinement in REFMAC, imposing tight 12-fold NCS restraints. Positive Fo-Fc density was observed at the known binding sites for zinc and magnesium and the corresponding atom was added to the structure. Details on data collection and refinement statistics are shown in Table 1. Diffraction data and the resulting structure derived from crystal form II were deposited to the protein databank (PDB ID 3f9k), and those for form I are available upon request.

Protein-Protein Interaction and Strand Transfer Activity Assays
His 6 -tag pull-down and yeast two-hybrid assays were performed as described previously [22,34,38]. The indicator S. cerevisiae strains Y187 and AH109 were from BD Biosciences. Untagged, recombinant INs were used in all strand transfer assays. HIV-1 and EIAV integration assays with the respective 500-bp and 225bp RU5 donor DNA substrates were carried out as previously described [22]. HIV-1 donor DNA was obtained as a PCR product using Pfu DNA polymerase (Stratagene) with the primer pair 59-GGACTGAGGGGCCTGAAATGAGC/59-ACTGT-TGGGTGTTCTTCACCGCCCC GCGAGCT and pU3U5 template; primers 59-TTAAGTTGGGTAACGCCAGG/59-ACT GTAGGATCTCGAACAGAC and pU3U5-EIAV template were used to make the EIAV donor [22,46].
For enhanced HIV-1 concerted integration assays, donor DNA was prepared by annealing DNA oligonucleotides 59-CCTT-TTAGTCAGTGTGGAAAATCTCTAGCA or 59-CCTTTT-AGTCAGTGTGGAAAATCTCTAGCAGT and 59-ACTGC-TAGAGATTTT CCACACTGACTAAAAGG to create a 32 bp mimic of the pre-processed or non-processed HIV-1 U5 cDNA terminus, respectively. Two ml HIV-1 IN in 750 mM NaCl, 2 mM DTT, 20 mM Tris-HCl, pH 7.4 (DB) was added to 36 ml master mix containing 0.55 mM donor DNA and 0.30 mg supercoiled pGEM-9Zf(-) target DNA in 25.3 mM NaCl, 5.5 mM MgSO 4 , 11 mM DTT, 4.4 mM ZnCl 2 , 22 mM HEPES-NaOH, pH 7.4. Following a 3-5 min pre-incubation at room temperature, reactions were supplemented with 2 ml LEDGF in DB and allowed to proceed at 37uC for 30 min. The final concentrations of HIV-1 IN and LEDGF were both 0.6 mM. The reactions were stopped by addition of 25 mM EDTA and 0.5% SDS. The products deproteinized by digestion with 30 mg Proteinase K for 1 h at 37uC and ethanol precipitation were resolved by electrophoresis in 1.5% agarose gels and detected using ethidium bromide.
For sequencing analysis, reaction products migrating as a band of ,3 kb were isolated from a 1.5% agarose gel and converted into fully double stranded forms by treatment with W29 DNA polymerase (New England Biolabs) in the presence of 500 mM dNTPs. The DNA was then 59-phosphorylated and ligated to a blunt-ended 1.2-kb PCR fragment spanning the Tn5 aminoglycoside-39-O-phosphotransferase gene flanked by KpnI sites [22]. Competent DH5a E. coli cells were transformed with the ligation mixture. Plasmids were isolated from individual kanamycinresistant colonies, and those releasing fragments of expected sizes (,3 and 1.2 kb) upon digestion with KpnI, were sequenced using primers annealing within the Tn5-derived fragment.

Infectivity Assays
Single cycle infectivity assays were done as described elsewhere [29,62]. Briefly, VSV-G-pseudotyped HIV-Luc carrying various IN alleles generated by transfecting 293T cells were titered using a 32 P-based reverse transcriptase (RT) assay. The Ledgf-null E2(2/ 2) mouse embryo fibroblasts (MEFs) transformed with simian virus 40 large T antigen were previously described [29]. Cells transfected with empty, WT, or mutant LEDGF pIRES2-eGFP expression vectors and sorted by FACS to enrich the GFP-positive population were lysed for western blot analyses or plated for infections. 10 h after plating, the cells were infected with equal RT-cpm of HIV-Luc variants. Cells were lysed and luciferase activity relative to the total protein content of the lysates was measured 44 h post infection.

Supporting Information
Text S1 Validation of LEDGF-Dependent  Figure 1A). The NTD colored dark blue belongs to a separate IN 2 LEDGF unit, its contacts with the IBD stabilize the closed trimer ( Figure 1B NTDs are connected to the same-chain CCDs by flexible hinges (black curves). Association of the two dimers during synaptic complex assembly involves a swap of an NTD from each dimer to interact with a CCD from the opposing dimer. When loaded, LEDGF would engage the CCDs from one dimer and an NTD from another, effectively stabilizing the complex. Further, it is also speculatively possible that the other NTD interacts with the IBD, as seen with the dark blue NTD in (A). (D) Alternative model for the assembly of the synaptic complex in which there is no NTD swap. In this case the IBD of LEDGF stabilizes the synaptic complex by locking the NTDs in the correct orientation for tetramerization. A lane with a sample identical to that in lane 1 was excised from the gel. In this gel slice, two wells were created for parallel separation of the 1-kb DNA ladder and another aliquot of sample 1. The DNA was then separated in the perpendicular direction in a 1.6% agarose gel and visualized with ethidium bromide. Projected migrations of the linear DNA standards are indicated with red crosses. Note migration of the full-site (FS) product along the arc defined by the linear DNA size standards, while circular DNA species (halfsite [HS], o.c. and s.c. target DNA) appear above the arc. Products of multiple full-site events are expected to result in a gamut of linear DNA species of variable lengths migrating as smears in agarose gels; akin to the full-site product, these species distribute along the arc. Found at: doi:10.1371/journal.ppat.1000259.s005 (2.33 MB TIF)