Structural and Functional Role of INI1 and LEDGF in the HIV-1 Preintegration Complex

Integration of the HIV-1 cDNA into the human genome is catalyzed by the viral integrase (IN) protein. Several studies have shown the importance of cellular cofactors that interact with integrase and affect viral integration and infectivity. In this study, we produced a stable complex between HIV-1 integrase, viral U5 DNA, the cellular cofactor LEDGF/p75 and the integrase binding domain of INI1 (INI1-IBD), a subunit of the SWI/SNF chromatin remodeling factor. The stoichiometry of the IN/LEDGF/INI1-IBD/DNA complex components was found to be 4/2/2/2 by mass spectrometry and Fluorescence Correlation Spectroscopy. Functional assays showed that INI1-IBD inhibits the 3′ processing reaction but does not interfere with specific viral DNA binding. Integration assays demonstrate that INI1-IBD decreases the amount of integration events but inhibits by-product formation such as donor/donor or linear full site integration molecules. Cryo-electron microscopy locates INI1-IBD within the cellular DNA binding site of the IN/LEDGF complex, constraining the highly flexible integrase in a stable conformation. Taken together, our results suggest that INI1 could stabilize the PIC in the host cell, by maintaining integrase in a stable constrained conformation which prevents non-specific interactions and auto integration on the route to its integration site within nucleosomes, while LEDGF organizes and stabilizes an active integrase tetramer suitable for specific vDNA integration. Moreover, our results provide the basis for a novel type of integrase inhibitor (conformational inhibitor) representing a potential new strategy for use in human therapy.


Introduction
During the early events of viral replication the RNA genome is converted into its cDNA copy which then, upon interaction with cellular and viral proteins, generates the pre-integration complex (PIC). Cellular trafficking along the microtubule network transports the PIC to the nuclear envelope. The lentivirus subfamily PICs exhibit karyophilic properties which enable them to enter the nucleus through the nuclear pore. To establish a productive infection, the viral cDNA must subsequently be integrated into the host genome by the integrase protein (IN), which is a permanent component of the virion and the PIC. IN performs several important steps in the life cycle of retroviruses. It was shown to be involved in several steps of HIV-1 replication, such as uncoating [1], reverse transcription [2], nuclear import [3], chromatin targeting [4] and integration [5]. Viral components such as IN cannot perform all these functions by themselves and need to recruit host cell proteins to efficiently carry out the different activities. The molecular details and temporal sequence of these processes, and particularly the role of cellular co-factors, remain largely unknown.
The IN enzyme consists of three structural and functional domains, namely the N-terminal zinc binding domain (residues , the central catalytic core domain (CCD; residues  containing the D, D, E triad that coordinates divalent ions and the C-terminal domain (residues 213-288). A systematic study of mutants in the catalytic core identified a mutation (F185K) which greatly increases its solubility [6]. This mutant was used for high resolution structural studies. Several partial structures of HIV-1 IN have been solved, namely the CCD domain alone [7][8][9][10], as well as the CCD domain combined with the C-terminal domain [11] or the N-terminal domain [12] and finally, the CCD in complex with the IN binding domain of LEDGF [13]. Structures of IN from other retroviruses have also been solved [14]. In these structures, the catalytic core is organized into a highly conserved dimer except for the IN encoded by the Rous associated virus type-1 [15], whereas the position of the N-terminal and C-terminal domains relative to the catalytic core domain is extremely variable (Fig. S1). Recently, the structures of two functional integration units have been solved, namely the crystallographic structure of the Prototype Foamy Virus (PFV) IN/DNA complex [16] and the cryo Electron Microscopy (cryo-EM) structure of the HIV-1 IN/ LEDGF/DNA complex [17]. To validate the comparison between the two structures we solved the EM structure of the PFV IN tetramer ( Fig. S2 and methods S1). The X-Ray structure of the PFV IN could be readily fitted in the envelope showing that the overall arrangement of the IN domain does not depend of the method used (EM and X-ray). Both structures showed that the functional unit is composed of an IN tetramer. The comparison of the two structures revealed a different organization of the monomers in the tetrameric unit (Fig. S3). Moreover, most of the residues showed to be important for DNA binding and/or 39processing in the HIV-1 integrase model constructed using the PFV IN structure [18] are also in interaction with DNA in our EM model [17].Taken together, the data reveal a high flexibility in the linkers between the IN domains as well as in their oligomeric organization. This inherent flexibility explains the propensity of IN to interact with multiple partners and to intervene in numerous biological functions by exposing and reshaping interaction surfaces [19][20][21]. The final arrangement of the domain is probably strongly dependent of the interaction with protein co-factors and IN function in the infected cell (microtubule migration, nuclear internalization, chromatin targeting and integration). Several cellular co-factors have been shown to be important for HIV-1 infection and to interact with HIV-1 IN [22][23][24][25][26]. Among them, the INtegrase Interactor protein 1 (INI1) which is a homolog of yeast SNF5, the core component of the SWI/SNF chromatin remodeling complex [27], and the Lens Epithelium-Derived Growth Factor (LEDGF) [28], a transcriptional co-activator. The function of LEDGF in HIV-1 infection is to target IN to chromosomes of infected cells [29]. Its expression is required for proviral integration and subsequent production of HIV-1 virions [30]. At the structural level, the interaction with LEDGF was shown to produce an IN active form by maintaining a stable HIV-1 IN tetramer [17].
INI1 was the first protein shown to interact with IN [27]. The 385 residue long INI1, contains a C-terminal SNF5 homology domain with 3 highly conserved sequence motifs: repeat 1 and 2 and a coiled-coil motif (Fig. S4). Repeat 1 was found to be necessary and sufficient to bind to IN [31]. The role of INI1 in the HIV-1 replication cycle remains controversial, but it has been clearly established that it acts both on the early and late stages of viral infection, probably by distinct mechanisms. In the late stage, INI1 may facilitate proviral transcription by enhancing Tat function [32][33][34][35][36]. Indeed, INI1 could act as a regulating factor to initiate one of two mutually exclusive transcription programs after integration, namely post-integration latency or high-level, Tat-dependent gene expression [37]. It has also been shown that over-expression of the INI1 integrase binding domain in the cell inhibits HIV-1 assembly by specifically binding to viral gag-pol protein [38]. Finally, INI1 was shown to be incorporated in mature virions with a stoichiometry of 1 INI1 for 2 IN molecules [39] and to incorporate SAP18 HDAC complex into virions [40]. INI1 has been shown to both increase [38,41,42] and inhibit [43] viral replication. In vitro experiments on reconstituted nucleosomes have demonstrated that purified SWI/SNF complexes stimulate viral DNA integration by restoring the DNA accessibility to IN via nucleosome remodeling [41].
In order to clarify the INI1 -mediated inhibition and/or activation functions in the early stage of HIV-1 infection, we analyzed the structure-function relationships of a quaternary complex comprising the full length wild type HIV-

Purification and Characterization of the IN/LEDGF/INI1-IBD Complex
In order to study the effect of INI1 on the structure and function of IN we intended to add a fragment of INI1 onto the previously characterized, highly soluble and active IN/LEDGF complex. The method used to produce the complexes yields a soluble, homogeneous and active entity. This is in sharp contrast with most of the earlier work on full length integrase which is known to be prone to aggregation. A soluble fragment of INI1 containing the bona-fide integrase binding domain was identified and produced using a structural genomic strategy. The amino acid sequence of INI1 was analyzed by a combination of programs, including multiple alignment [44] and various prediction tools [45] to define domain limits. A total of 16 fragments were cloned in fusion with 3 different affinity tags (MBP, GST, HIS) and were tested for expression and solubility. The INI1 fragment spanning residue 174 to 289 in fusion with 6 histidines was selected (Fig. S4).
Full length IN, full length LEDGF and the INI1 (174-289) (INI1-IBD) fragment were purified separately and solubilized using high salt and CHAPS. The IN/LEDGF/INI1-IBD complex was formed upon removal of the solubilizing agents by dialysis and was purified to homogeneity by affinity chromatography and gel filtration which showed a sharp and symmetric peak (Fig. S5A). The stoichiometry of the partners was determined by High-Mass MALDI ToF mass spectrometry analysis [46]. Control experiments identified the mass of the three components: IN (MH + = 32.8 kDa), LEDGF (MH + = 60.4 kDa) and His 6 -INI1-IBD (MH + = 17.2 kDa) (Fig. S5B). In a second step, the purified complex was chemically cross-linked prior to mass spectrometry. Trace amount of several protein and complexes were detected:  (Fig. S5C). Higher molecular weight complexes in the range between 500-1000 kDa were not detected, indicating that the complexes did not aggregate. These experiments show that a stable complex is formed between the previously characterized [4IN?2LEDGF] complex which incorporates 2 molecules of INI1-IBD.  (Fig. S6A), fully consistent with the diffusion of a DNA duplex of 26 kDa [47].The distribution of brightness (Fig. S6B), obtained from a large number of measurements (n < 60), was nearly mono-disperse with a median value of 0.77+/ 20.07 kHz per U5 vDNA-TXR duplex. Addition of IN/LEDGF to the U5 vDNA-TXR duplex solution shifted the autocorrelation curve to longer diffusion times (Fig. S6C), indicating an increase in the molecular weight of the diffusing species, in line with an interaction of U5 vDNA-TXR duplex with IN/LEDGF. Moreover, the observed increase in the Y-axis intercept of the autocorrelation curve, which is inversely proportional to the number of diffusing species, indicated a decrease in the total number of diffusing species (Fig. S6A). This suggests that more than one U5 vDNA-TXR duplex interacts with each IN/LEDGF complex. According to the binding experiments (see below), a fraction of the U5 vDNA-TXR duplexes in solution is likely to be not bound to the IN/LEDGF complexes in the FCS conditions. Therefore, to take into account the presence of both free and bound vDNA-TXR molecules, the autocorrelation curves were fitted by a two-population model (Eq. 2 in methods S1). To limit the number of variables in the fitting process, the value of the correlation time t D1 for the free molecules was fixed, using the aforementioned value obtained with U5 vDNA-TXR duplex alone. From the fit, the value of the diffusion constant of the U5 vDNA-TXR/IN/LEDGF complexes (D 2 ) was found to be 51+/ 20.2 mm 2 ?s 21 , suggesting that the molecular weight of the complexes is about 300 kDa. Moreover, the ratio of brightness between the complex of U5 vDNA-TXR duplex with IN/LEDGF and free U5 vDNA-TXR duplex (B2/B1) was found to be 1.96+/ 20. 62 (40 bp) of the same sequence as for the FCS experiments was modified at one of its 59ends by 6-Carboxyfluorescein (6FAM). As expected, an increase in the fluorescence anisotropy was observed upon addition of increasing concentrations of protein to a fixed concentration of DNA. The dissociation constant (Kd) was calculated using the Scatchard equation rewritten to fit the anisotropy data [48] as described in the methods S1.  (Fig. 1A, B). These values are similar to those found in previous studies [49].To assess the specificity of the binding sites for U5 vDNA duplex, competition experiments with an excess of non-fluorescent specific and non-specific DNA duplexes were performed. While the latter induced no shift in the titration curve, excess of non-fluorescent specific U5 vDNA duplex was found to shift the binding curve, in line with a competition of fluorescent and non-fluorescent specific U5 vDNA duplex for the binding sites. This indicates the specificity of both IN/LEDGF and IN/LEDGF/INI1-IBD complexes for U5 vDNA duplexes. Taken together these data indicate that the IN/ LEDGF complexes, with and without INI1-IBD, specifically bind the U5 DNA complexes with binding constants that differ only by a factor of three.

Influence of INI1-IBD and LEDGF on the 39 Processing Reaction
To investigate the 39-processing reaction catalyzed by the IN/ LEDGF and IN/LEDGF/INI1-IBD complexes, we used HIV-1 U5 viral DNA duplex with the same sequence as for the FCS and fluorescence anisotropy experiments, but labeled at one of its 39ends by 6-FAM. The 39 processing reaction with this DNA duplex releases a fluorescent GT-FAM dinucleotide, which results in a decrease of the fluorescence anisotropy [50]. The release of GT-FAM was monitored as a function of time for the IN/LEDGF and IN/LEDGF/INI1-IBD complexes (Fig. 1C). The results clearly show that the 39 processing reaction is fully inhibited in the ternary complex (IN/LEDGF/INI1-IBD), indicating that INI1-IBD does not affect DNA binding but protects the 39ends of the viral DNA from endonucleotidic cleavage by IN. This result has been confirmed using a gel based 39 processing assay (Fig. S7).

Influence of INI1-IBD and LEDGF on the Integration Reaction
The concerted integration reaction performed under standard conditions demonstrates that the recombinant IN/LEDGF complex performs this reaction in a highly efficient way. Half Site Integration (HSI), Full Site Integration (FSI) and donor-donor (d/d) integration products were detected by gel electrophoresis and showed that the global integration efficiency was higher for the IN/LEDGF complex than for isolated IN molecules ( Fig. 2A). Specific cloning and quantification of the circular FSI products attested that the IN/LEDGF complex catalyses 2 to 10 times more concerted integration events than isolated IN molecules (Fig. 2B).
In addition the structure of the integrated viral DNA was analyzed by sequencing the cloned circular FSI products. The sequences clearly show that the integration reaction catalyzed by the IN/ LEDGF complex is closer to the expected physiological reaction than IN alone since it produced two times more HIV-1-specific 5 bp staggered cuts of the target DNA (  (Fig. S9).
The maps of the IN/LEGDF/DNA [17] and the IN/LEGDF/ INI1-IBD/vDNA complexes could be readily superimposed (Fig. 4A). The existing atomic structures of IN (catalytic core, N-and C-terminal domains) and of the IN-Binding Domain of LEDGF were pre-positioned as determined previously in the IN/ LEGDF/DNA model. They were further refined by normal mode flexible fitting (NMFF) and structure idealization. The fitting parameters were chosen in a way that does not modify the fold of the protein domains, as described in the methods S1. The position of the IN tetramer was found to be unchanged upon addition of INI1-IBD. A density difference map was then calculated between the cryo-EM map and the fitted atomic structures, in order to reveal the positions of INI1-IBD, LEDGF and DNA (Fig. 3C). The difference map shows three groups of additional densities, corresponding to the IN interaction partners (INI1-IBD, LEDGF, vDNA). A saddle-shaped density is observed on the top of the IN dimers (pink density in Fig. 3C), which corresponds to INI1-IBD since its position is identical to the one found in the absence of DNA (Fig. 3A). A second group of additional densities is detected within the viral DNA binding sites, corresponding to bound DNA molecules. The difference map reveals rod-like structures consistent with the size and shape of DNA molecules (yellow densities in Fig. 3C). In the presence of INI1-IBD, two U5 vDNA duplexes could be fitted in the electron density map. Interestingly, U5 viral DNA duplexes are rotated by about 40u in the INI1-IBD containing complex, as compared to the IN/LEDGF/DNA strand transfer model [17] (Fig. 4B). The third group of additional densities is located at the  bottom of the structure and corresponds to the LEDGF protein which contributes to stabilize the complex (grey density in Fig. 3C). The fitting of the atomic models into the IN/LEDGF/ INI1-IBD/vDNA map was then further refined in order to reveal more clearly the positions of LEDGF, INI1-IBD and vDNA (Fig. 3D). The atomic structures fitted into the cryo-EM map showed that the complex contains 4 IN, 2 LEDGF, 2 INI1-IBD and 2 U5 vDNA molecules, confirming the mass spectrometry and FCS data. The structure of the complex is organized around two asymmetric IN dimers. The first dimer, formed by two monomers (A1, B1), shows different positioning of their respective N and C terminus and is related to the second dimer (A2, B2) by a twofold symmetry (Fig. 5). The INI1-IBD dimer interacts on the top of the IN tetramer and contacts mainly the N and C-termini of the two IN B monomers (NtB1, NtB2, CtB1, CtB2). A small lateral extension of INI1-IBD reaches the C-terminus of the two A monomers (CtA1, CtA2), in close proximity with the viral U5 DNA duplex (Fig. 5).

Discussion
HIV-1 IN is the platform protein present in all steps of the retroviral cycle involving the preintegration complex (PIC). IN forms the structural core of the PIC and is most probably involved in PIC migration along microtubules [53], transfer to the nucleus [54], as well as chromatin targeting [55] and integration.  (Fig. 4). The DNA orientation in the IN/LEDGF/INI1-IBD/vDNA complex is thus intermediate to those in the 39 processing and integration complexes (Fig. 7).
These tetramer in a stable constrained conformation. These data provide a structural basis for a new inhibition strategy that could be used in human therapy. These observations also strongly support the ability of IN to adapt its structure, in order to carry out specific functions directed by the partner protein. In vitro integration assays showed that the activity of the IN/LEDGF complex is strongly enhanced compared to IN alone, especially in low protein concentrations such as found in vivo. Furthermore the sequencing of the full-site integration products showed that the proportion of correctly integrated species with the characteristic 5 bp stagger is higher in the presence of LEDGF. These observations further strengthen the role of LEDGF as a molecular chaperone that organizes the IN tetramer in a functional and highly reactive species. Integration assays were also realized in the presence of INI1-IBD by providing 39 pre-processed vDNA duplexes to overcome the inhibition of the 39processing reaction experienced in the presence of INI1-IBD. The presence of INI1-IBD leads to reduced integration events and to a higher integration specificity since unwanted by-products such as linear full site integration or donor/donor integration are strongly reduced. These effects can be clearly explained by the IN/LEDGF/INI1-IBD/vDNA structure, which shows that INI1-IBD sits in the target DNA (tDNA) binding site, competing with the binding of tDNA and leading to reduced integration events as well as inhibition of nonspecific integration.
To establish structure -function relationships we hypothesize that the main structural and functional effects of INI1 on IN is mediated through the INI1 integrase binding domain, which has been shown to be the minimal sequence for the interaction of INI1 with IN [31]. INI1 has been shown to both increase [38,41,42] and inhibit [43] viral replication. The two contradictory functions of INI1, inhibition and activation probably occur at different times during the infection cycle. Little is known about the timing of the interaction of cellular proteins with IN. Assuming that INI1-IBD interacts with IN in the same way as the full length protein, the observation that a stable ternary complex between IN, LEDGF and INI1-IBD can be formed suggests that the two cellular proteins may interact with the PIC during the same temporal window. The interaction of INI1 with the PIC is probably an early event since it was shown that INI1 is incorporated in mature virions [39], that HIV-1 infection triggers the nuclear export of INI1 which associates with the incoming HIV-1 PICs [56] and that INI1 is present in the reverse transcription complex [57]. The fact that INI1 expression in a cell line deleted for the gene encoding INI1 increases viral replication in a dose-dependent manner [38] suggests that IN interacts with these newly produced INI1 molecules. Taken together, these observations suggest that the interaction between INI1-IBD and IN we observe in our structure is likely to occur between reverse transcription and 39 processing and before nuclear translocation. After nuclear internalization, both INI1 and LEDGF are likely to stabilize the highly flexible IN. LEDGF probably stabilizes the IN tetramer while INI1 might prevent non-specific protein interactions and auto-integration on the way to nucleosomes. Moreover, INI1, as part of the SWI/SNF chromatin remodeling complex, is believed to play a role in the control of viral integration through the chromatin reorganization of the host genome [41]. Indeed, in vitro experiments showed that stable nucleosomes reconstituted on strongly positioning DNA sequences inhibit the integration of viral DNA and that purified SWI/SNF complexes restore integration, suggesting a coupling between nucleosome remodeling and efficient HIV-1 integration [41]. Thus, SWI/SNF is thought to promote integration in target nucleosomes through its unwinding activity, by producing a suitable nucleosomal DNA for the strand transfer reaction. We speculate that INI1 may be released from IN during the nucleosome remodeling process in order to activate its integration function. In contrast, after INI1 release, LEDGF is likely to remain attached to IN in order to maintain its tetramer organization and to enhance the efficiency of integration (Fig. 7).
In Supplementary data are available online: Figures S1-S11 and Methods S1.

Production and Purification of HIV-1 IN, LEDGF and INI1 (174-289) (INI1-IBD)
The IN/LEDGF complex was produced and purified as previously described [17]. HIS-tagged INI1 (174-289) was cloned in pET expression plasmid and transformed in Escherichia coli BL21(DE3) host strain (Novagen) containing pRARE plasmids isolated from Rosetta DE3 strain (Novagen). After the INI1-IBD purification described in methods S1, the IN/LEDGF buffer was raised up to 2 M NaCl and 20 mM CHAPS. IN/LEDGF and INI1-IBD were mixed at a 1:2 molar ratio, respectively, and dialyzed against buffer B [50 mM HEPES pH 7.0, 500 mM NaCl, 5 mM MgCl 2 , 2 mM b-mercaptoethanol]. The ternary complex was concentrated using an Amicon Ultra-15 50 kDa device (Millipore) and loaded at 1 mL/min onto a Superdex 200 HR gel filtration column (GE Healthcare) pre-equilibrated in buffer B. Peak fractions were used directly for electron microscopy and functional tests. Protein concentration was determined using the Bradford colorimetric assay (Bio-Rad). The purity of the complex was checked on SDS-PAGE and DNA contamination by UV spectrum.

High-Mass MALDI ToF Mass Spectrometry Analysis
High-Mass MALDI mass spectra were obtained using a MALDI-TOF equipped with HM2 TUVO High-Mass retrofit system (CovalX AG, Zürich, Switzerland). The High-Mass retrofit system allows a sensitive detection (sub-mM) of macromolecules up to 1500 kDa with low saturation. The instrument was operated in the linear mode by applying an accelerating voltage of 20 kV and a gain voltage set to 2.95 kV. Mass spectra were acquired by averaging 300 shots (3 different positions into each spot and 100 shots per position). All subsequent mass spectra acquisitions were performed by applying the same laser fluency before and after cross-linking. Further information is provided in methods S1.

Fluorescence Correlation Spectroscopy
Fluorescence correlation spectroscopy (FCS) measurements were performed with in-house setup [58,59], consisting of an Olympus IX 71 microscope associated with a two-photon excitation at 800 nm, provided by a mode-locked Ti:Sapphire laser (Tsunami, Spectra Physics). Emitted photons were detected with an Avalanche Photodiode (APD SPCM-AQR-14-FC, PerkinElmer Optoelectronics). The normalized autocorrelation function G(t) was calculated on line by a hardware correlator (ALV 5000, ALV GmbH). Multiple FCS runs [60] of short duration (5 s) were performed on solutions of viral DNA tagged with Texas Red (vDNA-TXR), without and with IN/LEDGF. The excitation power was about 5 mW at the sample, in order to provide optimal signal/noise ratio and minimal probe photobleaching [61]. The details of data processing are described in methods S1.

IN-LEDGF and IN-LEDGF-INI1-IBD 39 Processing Activity Monitored by Fluorescence Anisotropy
The reaction was done in a 96 well-plate. One well contained 100mL of reaction mix composed of 10 mM NaCl, 25 mM BisTris pH 6.5, 10 mM MgCl 2 , 5 mM DTT, 50 nM DNA and 200 nM of protein complex. The DNA is a 40 base pair double strand DNA, mimicking the U5 end of HIV-1 DNA and 39 modified by 6-fluorescein. After homogenization, 50mL of paraffin oil was added on the top of the well to avoid evaporation. Fluorescence anisotropy measurements were performed on a PHERAstar Plus (BMGLab) spectrophotofluorimeter with an excitation polarized wavelength of 470 nm. The reaction was monitored for 6 hours at 37uC. Further information is provided in methods S1.

In vitro Concerted Integration Assay
Standard concerted integration reactions were performed as described previously [41,62] (15 ng), containing the processed U3 and U5 LTR sequences and a SupF gene, and the target DNA plasmid pBSK + (150 ng) at 0uC for 20 min in a total volume of 5 ml. Then, the reaction mixture (20 mM HEPES, pH 7.5; 10 mM DTT; 10 mM MgCl 2 ; 15% DMSO; 8% PEG, 30 mM NaCl) was added and the reaction proceeded for 120 min at 37uC in a total volume of 10 mL. Incubation was stopped by adding a phenol/isoamyl alcohol/chloroform mix (24/1/25 v/v/ v). The aqueous phase was loaded on a vertical 1% agarose gel in the presence of 1% bromophenol blue and 1 mM EDTA. After separation of the products, the gel was treated with 5% TCA for 20 min, dried and autoradiographed. All IN activities were quantified by scanning of the bands (half site plus full site integration products) after gel electrophoresis and autoradiography using the Image J software. Both target DNA and donor plasmids were kind gifts from Dr. K Moreau (Université Claude Bernard-Lyon I, France). The target corresponds to the pBSK + plasmid (Stratagene, La Jolla, California), carrying the zeocin resistanceencoding gene.
The Full Site Integration (FSI) reaction was additionally quantified by cloning the integration products into bacteria using the same protocol as described previously [62]. Briefly, after concerted integration, the products were purified on a DNA purification system column (Promega), as described by the supplier and then introduced into a MC1060/P3 E. coli strain which contained ampicillin-, tetracycline-and kanamycin-resistance genes. Both ampicillin-and tetracycline-resistance genes carry an amb mutation. These proteins are thus expressed only in the presence of supF gene products. Integration clones carrying the supF gene were therefore selected in the presence of 40 mg/ml ampicillin, 10 mg/ml tetracycline and 15 mg/ml kanamycin. The integration loci structure was determined by isolating plasmids from quadruple-resistant colonies and PCR sequencing (ABI Prism big dye terminator cycle sequencing ready reaction kit, Applied Biosystems) using the U3 primer (59-TATGGAAGGGC-TAATTCACT-39) and the U5 primer (59-TATGCTAGA-GATTTTCCACA-39).

Electron Microscopy and Image Processing
For negatively stained samples, the purified IN/LEDGF/INI1-IBD complexes were diluted to a concentration of 20 mg/mL in buffer B and crosslinked with 0.1% glutaraldehyde for 5 sec. 10 mL of this preparation were placed on a 10 nm thick carbon film treated by a glow discharge in air. After two minutes of adsorption, the specimen was negatively stained with a 2% (w/v) uranyl acetate solution. Images were recorded under low-dose conditions on a transmission electron microscope (TEM, Philips CM120), equipped with a LaB 6 cathode and operating at 100 kV at 45,000 X magnification on a Pelletier cooled slow scan CCD For frozen hydrated samples, the complexes were diluted and adsorbed as described above, but were vitrified using an automated plunger equipped with a temperature and humidity controlled chamber (Vitrobot FEI). Images were recorded under low-dose conditions (15-20 e 2/ Å 2 ) on a cryo-TEM equipped with a field emission gun operating at 200 kV (Tecnai F20, FEI) and with a side-entry cold stage working at a temperature of 2172uC. The image data set was acquired on photographic plates (SO163, Eastman-Kodak) at 50,000 X magnification and at defocus values ranging from 2.7 to 3.9 mm. The micrographs were digitized using a drum scanner (Primescan D7100, Heidelberg) to obtain a final pixel spacing of 0.2 nm. Examples of the initial images are shown in figure S10. Class average images obtained after reference-free classification and the corresponding re-projections of the final 3-D model fitting are shown in figure S11. Image analysis is described in methods S1.

Model Building and Fitting
Flexible fitting of the atomic structures in the EM maps was done using normal mode analysis [63,64]. Normal mode flexible and rigid body fitting were performed with the procedure described in methods S1, using NORMA [63] and URO [65]. Regularization of the structure was done with REFMAC [66]. Difference maps were calculated using CCP4 [67] and COOT [68] and map superposition was performed with UCSF chimera [69]. Figures were produced with UCSF chimera and Pymol [70]. Further information is provided in methods S1.  [71] in pink, the Prototype foamy virus (PVF) [72] in cyan, the Human immunodeficiency virus type 2 (HIV-2) [73] in yellow and the Human immunodeficiency virus type 1 (HIV-1) [12] in red. The N-terminal domains are circled in blue. B. Superimposition of the structures of the IN catalytic core domain of the Rous sarcoma virus (RSV) [74] in gold, HIV-1 [11] in red and PVF [72] in cyan. Methods S1 Detailed information on the experimental methods used (Production and Purification, High-Mass MALDI ToF, Fluorescence Correlation Spectroscopy, Fluorescence Anisotropy, Cryo-electron Microscopy) is provided. (PDF)