Structure of a Spumaretrovirus Gag Central Domain Reveals an Ancient Retroviral Capsid

The Spumaretrovirinae, or foamy viruses (FVs) are complex retroviruses that infect many species of monkey and ape. Despite little sequence homology, FV and orthoretroviral Gag proteins perform equivalent functions, including genome packaging, virion assembly, trafficking and membrane targeting. However, there is a paucity of structural information for FVs and it is unclear how disparate FV and orthoretroviral Gag molecules share the same function. To probe the functional overlap of FV and orthoretroviral Gag we have determined the structure of a central region of Gag from the Prototype FV (PFV). The structure comprises two all α-helical domains NtDCEN and CtDCEN that although they have no sequence similarity, we show they share the same core fold as the N- (NtDCA) and C-terminal domains (CtDCA) of archetypal orthoretroviral capsid protein (CA). Moreover, structural comparisons with orthoretroviral CA align PFV NtDCEN and CtDCEN with NtDCA and CtDCA respectively. Further in vitro and functional virological assays reveal that residues making inter-domain NtDCEN—CtDCEN interactions are required for PFV capsid assembly and that intact capsid is required for PFV reverse transcription. These data provide the first information that relates the Gag proteins of Spuma and Orthoretrovirinae and suggests a common ancestor for both lineages containing an ancient CA fold.


Introduction
Spuma-or foamy viruses (FVs) are complex retroviruses that constitute the only members of the Spumaretrovirinae subfamily within the Retroviridae family [1]. They have been isolated from a variety of primate hosts [2][3][4][5] as well as from cats [6][7][8], cattle [9], horses [10] and sheep [11]. Endogenous FVs have also been described in sloth [12], aye-aye [13] and coelacanth [14]. Prototype foamyvirus (PFV) is a FV isolated from human sources [15,16]. The PFV genome is highly similar to that of simian foamy virus isolates from chimpanzee (SFV cpz ) and so infection in humans is believed to have arisen through zoonotic transmission [17][18][19]. Nevertheless, even though FVs are endemic within non-human primates and display a broad host range, human-to-human transmission of PFV has never been detected. Moreover, although in cell culture FV infection causes pronounced cytopathic effects [20], infection in humans and natural hosts is apparently asymptomatic [21][22][23] making their usage as vectors for gene therapy an attractive proposition [24].
FVs share many similarities with other retroviruses in respect of their genome organisation and life cycle. However, they vary from the Orthoretrovirinae in a number of important ways. These include the timing of reverse transcription that occurs in virus producer cells rather than newly infected cells [25,26] and the absence of a Gag-Pol fusion protein [27,28]. In addition, the Gag protein remains largely unprocessed in FVs [29] whereas within the Orthoretrovirinae processing of the Gag polyprotein represents a critical step in viral maturation, producing the internal structural proteins Matrix (MA), Capsid (CA) and Nucleocapsid (NC) found in mature virions. Furthermore, FV Gag lacks the Major Homology Region (MHR) and Cys-His boxes found in orthoretroviral CA and NC, respectively. Despite these profound dissimilarities, the Gag proteins of the two retroviral subfamilies carries out the same functional roles including viral assembly, nucleic acid packaging, transport to and budding through the cytoplasmic membrane of the producer cell as well as trafficking through the cytoplasm of the target cell and uncoating. In addition, FV Gag also contains the determinants for restriction by Trim5α [30,31] that in orthoretroviruses comprises the assembled CA lattice [32].
To date, high-resolution X-ray and/or NMR structures have been reported for MA, CA and NC components of Gag from numerous retroviruses [33][34][35][36][37][38][39][40][41][42] but among FVs only the structure of the Env-binding N-terminal domain of PFV-Gag has been reported [43]. Further structural information with regard to other Gag domains of FVs has remained elusive but is vital for any detailed understanding of how FV Gag fulfils its many functions. Here we report the structure and present structure/function studies of a di-domain from the central region of PFV-Gag. Our data reveal that although unrelated at the level of primary sequence, FV central domains are structurally related to the N-and C-terminal domains of orthoretroviral CA. Moreover, they share the capacity for self-association and are required for virion capsid assembly and viral infectivity. Further phylogenetic and combined comparative structural analysis reveals FV central domains also have the same organisational arrangement as orthoretroviral CA and we propose that both arose through genetic divergence from a common, double domain ancestor.

Structure of the PFV-Gag central conserved region
Alignment of the primary sequences of FV Gag proteins from primate and other mammalian hosts reveals two regions of strong conservation, an N-terminal region corresponding to the Env-binding domain [43][44][45] containing the cytoplasmic targeting and retention sequence (CTRS) [46,47] and the other located centrally containing highly conserved PGQA and YxxLGL sequences [48] and just N-terminal to the chromatin binding sequence (CBS) [49]and GR boxes [50] (Fig 1A). Within this central region, large sections of highly conserved sequence are present (Fig 1B). Therefore, to understand more about the nature of the PFV-Gag central conserved region, the structure of PFV-Gag(300-477) was determined in solution using multidimensional heteronuclear NMR spectroscopy. Details of data collection, structure determination and model quality are presented in Table 1.
The structure comprises two all helical domains, connected by a short 5-residue linker ( Fig  1C). Residues P300-H383 make up the N-terminal domain (PFV-NtD CEN ) containing four helices (α1-α4) and the C-terminal domain (PFV-CtD CEN ), residues H389-R477, contains the remaining five helices (α5-α9). Superposition of the 20 conformers in the family of structures results in a backbone atom rmsd of 0.3 Å for ordered residues 304-355, 358-477 showing that the structure is well defined except for the N-and C-termini and loop regions (S1A Fig). In PFV-NtD CEN , helices α1-α3 form an antiparallel 3-helix bundle connected to α4 by a long loop that closely tracks one face of α3. In PFV-CtD CEN helices α5-α9 are arranged as a five-helix antiparallel bundle. In both domains, the inner faces of the helices pack to form an extensive hydrophobic core through interaction of apolar sidechains.
Examination of the protein backbone dynamics using 15 N NMR relaxation measurements (S1B Fig), show that residues within helices α1-α4 and α5-α9 of the PFV-NtD CEN and PFV-CtD CEN exhibit large and positive heteronuclear NOE (HetNOE) values and have uniform 15 N-T 1 and -T 2 values indicating a rigid backbone. Additionally, the presence of interdomain NOEs, together with little variation in the T 1 /T 2 values, suggests the PFV-NtD CEN and PFV-CtD CEN are structurally and dynamically dependent and have a coupled movement. Based on these relaxation rates and assuming an isotropic model, a rotational correlation time (t c ) of 14.1 ns for the NtD CEN -CtD CEN di-domain was determined, consistent with a~20 kD globular protein. The residues at the N-and C-termini outside of this core region have lower T 1 and higher T 2 values, reduced or negative HetNOEs, close to zero 1 D NH residual dipolar couplings (RDC) and mainly random coil chemical shifts indicating rapid (psec) internal motion in these terminal regions. In addition, the relaxation data also reveals internal regions of high mobility, including residues G356 to G366 located in the long loop connecting α3-α4, residues G384 to P388 in the NtD CEN -CtD CEN interdomain linker and G432 part of a stretch of highly conserved residues (-P 431 -G-Q-A 434 -) located in the loop connecting α7-α8 of CtD CEN and in close spatial proximity to the conserved Y/F 464 -x-x-L-G-L 469 motif (Fig 1A and 1B), at the C-terminus of α9 that is required for Gag assembly [48]. Together with these relaxation data a number of interdomain NOEs (S1C Fig) define a largely hydrophobic NtD CEN -CtD CEN interface comprising 550Å 2 of buried surface area (Fig 1D). Although not extensive in area, there is substantial packing of apolar sidechains from NtD CEN residues on helices α2 and α4 (I326, V375 and F379) with CtD CEN residues (V394, I398, L410, M413 and L414) on helices α5 and α6 (Fig 1D) that contribute to the stability of the interface.
Structural similarity with CA of other retroviral genera Initial structural similarity searches of the PDB with PFV-Gag(300-477), PFV-NtD CEN and PFV-CtD CEN were conducted using the SSM server [51]. Application of this approach, produced only very weak matches based on the quality of alignment Q-scores (0.1-0.3). Nevertheless, 11 of the top 15 alignments for individual NtD CEN and CtD CEN domains were with either amino-(NtD CA ) or carboxyl-terminal domains (CtD CA ) from orthoretroviral CAs (S1 Table). However, although matches were found for NtD CEN with orthoretroviral NtD CA domains and for CtD CEN with orthoretroviral CtD CA domains and the helical connectivity and topological arrangement of secondary structures were largely conserved (S2  Regions corresponding to the Gag-NtD and Gagcentral domains are coloured cyan and magenta respectively. Sequence motifs and conserved regions are highlighted cytoplasmic targeting and retention sequence (CTRS) (blue), PGQA and YxxLGL (orange), Chromatin binding sequence (CBS) (green) and GR boxes (yellow). The Gag processing cleavage site is indicated with an arrow. (B) Sequence alignment of foamy virus Gag-central domains from mammals, old and new world monkeys (SFV). Mammalian FVs are abbreviated as follows: BFV, Bovine; EFV, Equine; FFV, Feline. Monkey species are abbreviated as follows: mac, Macaque; agm, African green monkey; spm, Spider monkey; sqm, Squirrel monkey; mar, marmoset. Numbering corresponds to the PFV sequence. Cartoons (cyan coils) above the alignment indicate the position of α-helices in the PFV-Gag NtD CEN and CtD CEN domain structures. The regions with greatest sequence homology are boxed and highlighted and residues that are conserved in all sequences are also coloured white. (C) Cartoon representation of PFV-Gag (300-477) backbone is shown in cyan. The secondary structure elements are numbered sequentially from the amino-terminus and the N-and C-termini are indicated. Helices α1 to α4 and α5 to α9 that comprise NtD CEN and CtD CEN respectively are indicated in the left and right hand panels. (D) The PFV-Gag NtD CEN and CtD CEN interface. The protein backbone is shown in grey cartoon representation. NtD CEN and CtD CEN α-helices that pack at the interface are labelled. Residues that make hydrophobic contacts are shown as sticks, blue from NtD CEN and green from CtD CEN . Inspection of these alignments reveals a closest match for PFV-Gag NtD CEN with the CtD CA of the alpha-retrovirus RSV (3G1G) based upon rmsd over all aligned α-carbons. However, in all these alignments the orthoretroviral CtD CA structures contain an additional α-helix that inserts between α3 and α4 of NtD CEN (Fig 2D and 2E). Structural alignments with orthoretroviral NtD CA , reveal the closest match is between PFV-Gag CtD CEN and the NtD CA of the gamma-retrovirus MLV (3BP9) (Fig 2H). Again, however, although the core fold aligns well, the interspersing loops that connect the secondary structure elements in the orthoretroviral NtD CA are absent or much shorter in PFV-Gag CtD CEN .
These data provide evidence for a structural conservation between orthoretroviral CA and spumaretroviral Gag but these very weak alignments do not discriminate well between NtD CEN −NtD CA , CtD CEN −CtD CA (forward; NN, CC) and NtD CEN −CtD CA , CtD CEN −NtD CA (reverse; NC, CN) pairings. Therefore, to assess the significance and quantify the degree of similarity for forward and reverse pairings we applied a structural alignment method based on the generation of a population of 'decoy' models to provide a background distribution of scores [52] combined with structural superposition using the SAP program [53]. This method has the advantage that it uses a local structural environment-based alignment and that each comparison in the random pool is between two models of the same size and secondary structure composition as the pair of native structures being investigated.
For this analysis five orthoretroviral CA proteins were chosen where both NtD CA and CtD CA structures were available. Individual CA domains were then compared with both PFV-Gag NtD CEN and CtD CEN and the associated decoy models. The degree of similarity between the domains with respect to the bulk alignments with decoy models ranged from < 2σ to > 5σ (Zscore). However, as with the SSM searches significant 4σ results were obtained for both reverse as well as forward alignments, Table 2. Of the top five Z-scores in Table 2, four are associated with  N-N and C-C pairings. Although this does suggest conventional forward linear domain equivalence, in order to obtain a more quantitative consensus for forward versus the reverse domain pairings, the Z-scores for each domain pairing were combined using a T-test statistic over all five viruses. Employing this analysis, all four possible domain pairings were significant with probabilities (T prob ) ranging from 10 −6 to > 10 −18 . However, the two reversed pairings (NC and CN) have lower probabilities than the forward pairings (NN and CC) Table 2 and by combining the probabilities log 10 (T prob NN.T prob CC)-log 10 (T prob NC.T prob CN) a 12-log difference-probability (ΔT prob ) is now apparent for the forward pairing with respect to the reverse. Both the T and Z statistics support an ancestral relationship between the central domains of PFV-Gag and the NtD CA and CtD CA of orthoretroviral CA. This suggested forward pairing (NN and CC) would support the notion that the orthoretroviral CA and PFV-Gag NtD CEN -CtD CEN arose through genetic divergence from a common, double domain ancestor without a requirement for transposition.

Oligomerisation state of foamy virus Gag-central domains
Given the requirement for CA oligomerisation in orthoretroviral Gag assembly and maturation, the self-association and assembly properties of PFV-Gag(300-477), PFV-NtD CEN and PFV-CtD CEN were analysed by sedimentation velocity (SV) and equilibrium (SE) analytical ultracentrifugation (AUC). The experimental parameters, molecular weights derived from the data and statistics relating to the quality of fits are shown in Table 3. The S 20,w value remained constant across the concentration range tested. c The weight averaged molecular weight derived from the best fit C(S) function. SV-AUC analysis of the whole of the conserved region, PFV-Gag(300-477), revealed a sedimentation coefficient (S 20,w ) of 1.87 (Fig 3A) and derived molar mass of 20.6 kDa demonstrating that PFV-Gag(300-477) is a stable monomer in solution. These observations were confirmed by multispeed SE-AUC at varying protein concentration. The equilibrium distribution from an individual multispeed experiment is presented in Fig 3B. The individual gradient profiles showed no concentration dependency of the molecular weight and fit globally with a single ideal molecular species model, producing weight averaged molecular weight of 20.3 kDa demonstrating the monomeric nature of this PFV central region. SV-AUC analysis of PFV-Gag NtD CEN measured at high protein concentration (188 μM) also revealed this domain to be monomeric in solution with a only a single species, (S 20,w ) of 1.25 (Fig 3A) with derived molar mass of 10.3 kDa present ( Table 3). By contrast SV-AUC data recorded on PFV-Gag CtD CEN produced a sedimentation coefficient continuous distribution function, C(S), that contained two species with S 20,w of 1.65 and 2.07 with derived molecular weights of 14.7 kD and 20.7 kD ( Table 3 and Fig 3A). Notably, the proportion of the fast 2.07 S, component increased with increasing concentration (S3 Fig) consistent with monomer-dimer equilibrium. Therefore, in order to quantify the affinity and stoichiometry of self-association, multispeed SE-AUC recorded at varying protein concentration was employed. These data (Fig 3B) are best fit by a monomer-dimer self-association model where the 11.9 kDa PFV-Gag CtD CEN monomers dimerise with an equilibrium association constant of 1.1x10 6 M -1 (0.9 μM K D ). These data are consistent with the distribution of peaks in the C(S) functions derived from SV-AUC data. Moreover, they reveal that whilst the entire PFV-Gag central region is monomeric PFV-Gag CtD CEN has the propensity for self-association.

The PFV-Gag CtD CEN homodimer
Given the dimerisation properties of PFV-Gag CtD CEN and the structural homology with selfassociating orthoretroviral CA-domains we determined the solution structure of the PFV-Gag CtD CEN homodimer. Details of data collection and structure determination are presented in Table 1 The homodimer interaction is defined by numerous NOEs (S4C Fig) and encompasses 470 Å 2 of buried surface. The interface is largely hydrophobic with the majority of interactions resulting from packing of α6 of one monomer against α6 of the opposing monomer together with some contribution from hydrophobic side chains of residues on α5 (Fig 4B). At the centre of the interface the side chains of I398, L410 and M413 from one monomer pack against I398 Ã , L410 Ã and M413 Ã of the opposing monomer and comprise a continuous apolar network. Disruption of this network by introduction of an L410E/M413E double mutation results in total loss of dimerisation as revealed by SV-AUC analysis (Fig 4C). Notably, I398, L410 and M413 are also involved in the NtD CEN -CtD CEN interface were they make apolar contacts with side chains of residues on α2 and α4 in NtD CEN (Fig 1D).

NtD CEN -CtD CEN interface mutations effect virus infectivity and particle morphology
To probe the function of domain interface residues in a virological context, V375Q and L410E/ M413E amino acid interface-disrupting mutations were introduced into PFV-Gag in a mammalian virus expression system. In addition, W371A or C368A alanine substitution mutations designed to disrupt hydrophobic packing of the Gag-NtD CEN domain were also made along with particles lacking reverse transcriptase (iRT). The effects of these substitutions on virus Gag/Env/Pol processing, particle production, and infectivity were then assessed (Fig 5). In all instances, viral particles were produced and the composition and processing of Gag Pol and Env was comparable with wt PFV (Fig 5A), although, overall particle production was reduced between 3-5 fold, in all of the mutants (Fig 5B). In contrast to these small particle production defects, viral infectivity upon introduction of V375Q and L410E/M413E interface mutations was reduced by over 4 orders of magnitude (Fig 5C) comparable with 3-4 log reductions observed in W371A and C368A NtD CEN disruption mutants and 4 log reductions observed with a combined W371A/V375Q mutant or a Gag wt /Pol iRT virus.
Given these large effects on viral infectivity, the morphology and integrity of particles was also assessed by cryo-electron microscopy (cEM) (Fig 6). Analysis of wt PFV (Fig 6A and S2  Table) reveals roughly spherical 1000 to 1300 Å diameter particles with external spikes of the Env protein and core structures as previously described [45,54]. We performed cryo-tomography to study virus particles in 3-dimensions. The majority of particles contain a dense core structure, 600 to 800 Å, in their interior. In some instances, two cores were present, often correlating with a larger virion size, as observed with other foamy virus [45] and orthoretroviral particles [55]. Inspection of the core morphology revealed that it comprised an 80-100 Å layer that is strongly faceted and contains vertices indicative of a polyhedral structure with underlying icosahedral order. By contrast, although of similar size and displaying Env spikes, no virus particles with V375Q and L410E/M413E interface mutations contained an internal dense core, indicating they have defects in core assembly (Fig 6B and S2 Table). The particles appear either empty or in some cases contain a diffuse layer of density close to the inner side of viral envelope. Similarly, particles of NtD CEN disruption mutants C368A and W371A also have wt size distribution and external morphology but have no cores (Fig 6B and S2 Table) demonstrating that mutations affecting NtD CEN -CtD CEN interactions and those designed to interfere with Gag central domain folding are both deleterious to core assembly.
The effects of the interface and NtD CEN disruption mutations on reverse-transcription of the viral genome were also examined by qPCR. These data (Fig 7) revealed that all particles contained similar levels of PFV RNA suggesting that there was no requirement for an assembled viral core to recruit and/or package RNA genomes. However, quantitation of viral DNA revealed that in both the interface or Gag-NtD CEN disruption mutants that lack cores, there was a 100-fold reduction in the DNA genome content. The DNA genome content of the iRT mutant was reduced 1000-fold. Given that reverse transcriptase is recruited into particles in the mutants with a comparable efficiency to wt (Fig 5A) these data reveal a requirement for core formation in order for efficient reverse transcription to occur.

Discussion
The foamy virus Gag central domain is related to orthoretroviral CA Gag is the major structural protein of both spuma and orthoretroviral subfamilies, required for viral assembly, genome packaging and budding from producer cells [56]. Nevertheless, despite the conservation of function, spuma and orthoretroviral Gag share little if any sequence identity [57]. Any relatedness in terms of structure therefore remains unclear. Previous studies have shown that an N-terminal domain from spumaretroviral Gag (PFV-Gag-NtD), whilst possessing some of the functional properties of orthoretroviral Gag MA and CA maturation products, is entirely unrelated on a structural level [43]. We have now determined the solution structure of a central region of PFV-Gag (NtD CEN -CtD CEN ). By contrast with the N terminal region, this structure reveals that the central region of spumaretroviral Gag has unanticipated structural similarity to the NtD CA and CtD CA of orthoretroviruses. The NtD CEN and CtD CEN domains comprise 4 and 5 helical bundles, respectively, that in terms of topology align well with secondary structure elements of NtD CA and CtD CA domains. However, overall the alignment is relatively weak and although the core helical bundles are structurally very similar, the orthoretroviral NtD CA and CtD CA contain additional helices and loop insertions. We therefore applied an unbiased objective approach to assess the degree of similarity between PFV-NtD CEN and PFV-CtD CEN with NtD CA and CtD CA domains [52,53]. This analysis confirmed the relationship between the spuma-and orthoretroviral sequences and revealed that by far the Structure of Spumaretrovirus Gag preferred statistical alignment was also the most plausible on biological grounds, specifically a "forward pairing" where PFV-NtD CEN corresponds to NtD CA and PFV-CtD CEN relates to CtD CA . Based on these observations, it is reasonable to conclude that the related central regions of the Gag proteins of spuma-and orthoretroviruses, as well as having conserved functions have arisen as a result of genetic divergence from a common, double domain ancestor.

Gag assembly
The capacity to form an assembled lattice is a key feature of retroviral Gag proteins. These structures have been well characterised for mature orthoretroviruses [58], though the versions present in immature viruses remain relatively poorly defined [59][60][61]. Nevertheless it is clear that the formation of CA hexamers is vital for the assembly process. By contrast, there is much less information available regarding spumavirus Gag mediated assembly. It has been demonstrated that PFV-Gag-NtD self-associates into dimers [43]. Our findings now identify PFV-Gag (NtD CEN -CtD CEN ) that is structurally related to orthoretroviral CA, has the functional properties of a protein involved in capsid assembly and moreover, FV polyhedral core structure is dependant on PFV-Gag (NtD CEN -CtD CEN ) structural integrity.
A clue to how PFV Gag might assemble is revealed by the structure of PFV-CtD CEN (Fig 4). In isolation PFV-CtD CEN forms weak dimers, K D = 0.9 μM (Fig 3) through homotypic interactions mediated by hydrophobic side chains located on helices α5 and α6. This is in contrast, to the orthoretroviruses where the major CA-CtD interface is formed through homotypic interactions between residues on CA-CtD α9 that would align to α7 in PFV-CtD CEN and therefore appears unrelated. Nevertheless, in the context of intact PFV-Gag, formation of these CtD CEN -CtD CEN interactions would require conformational rearrangement to expose the α5-α6 interface that would consequently release the NtD CEN domains to make further homotypic interactions. However, given we have demonstrated the capacity for CtD CEN self-association it is a possibility that the CtD CEN -CtD CEN interface is utilised by FV-Gag in CA assembly. Moreover, since Gag conformational switching is a major driver in the maturation of orthoretroviruses [59][60][61][62] the notion of a conformational change in FV Gag is certainly plausible. In further support of this notion, notably the Major Homology Region (MHR) of orthoretroviral CA is a critical driver of maturation and assembly [63][64][65]. The MHR comprises a strand-turn-helix structure that makes intra-hexamer homotypic CA-CtD interactions in the immature CA lattice [60,61] and maps to α5 and α6 region of PFV-CtD CEN in our alignments. Therefore, although the α5 -α6 and MHR motifs are structurally unrelated their positioning suggests a conservation of assembly function in this region.
Another prominent feature of PFV-Gag is the YxxLGL motif (Fig 1A) (residues Y464-L469) that is conserved in all spumaretroviruses (Fig 1B) and is required for particle assembly [48]. In the PFV-Gag(NtD CEN -CtD CEN ) structure this motif is found at the C-terminus of α9 in CtD CEN (S5 Fig). The aromatic side chain of Y464 packs into a hydrophobic

Fig 5. Particle production and infectivity of PFV-Gag central domain mutants. (A) Western blot analysis of producer cell lysates (Cell) and pelleted viral supernatants (Virus) with polyclonal antibodies specific for PFV-Gag (α-Gag) and PFV Env-LP (α-Env-LP) or monoclonal antibodies specific for PFV-PR/RT (α-PR/RT) and integrase (α-IN).
Residue substitutions in Gag are indicated above each track, (wt) wild type virus, (wt +iRT) wild type virus with defective reverse transcriptase. In the right-hand panel %wt are different wt control loadings and arrows indicate the migration of Gag, Env and Pol proteins. (B) Relative amounts of released Gag quantified from Western blots data from two independent experiments. (C) Relative infectivity of extracellular 293T cell culture supernatants using an eGFP marker gene transfer assay, determined 3 days post infection. Means and standard deviations of three independent experiments are shown. The values obtained using the wild type Gag packaging vector were arbitrarily set to 100%. Absolute titres of these supernatants were 1.8 x 10 6 to 1.1 x 10 7 ffu/ml. pocket and forms part of the core of the CtD CEN helical bundle. Notably, as only Y or F are observed at this position amongst FV Gags (Fig 1B) the conservation is likely a result of the structural requirement for a phenyl group at this position to be buried in the hydrophobic core. By contrast, the side chains in the LGL portion of the motif are exposed and abut residues from another highly conserved PGQA motif at the N-terminal of α8 in CtD CEN (residues 431-434; Fig 1B) to form a continuous surface hydrophobic patch located~180°away from the α5 -α6 interface of CtD CEN (S5 Fig). Given the requirement for capsid assembly, one notion is that α5 -α6 homotypic interactions and further self-association through YxxLGL/PGQA surface patch when combined with PFV-Gag-NtD dimerisation, might also give rise to hexameric assemblies analogous to those formed in orthoretroviruses. However, notably the helices containing the YxxLGL/PGQA patch actually align with α10 and α11 of orthoretroviral CA that are not major drivers of orthoretroviral CA assembly suggesting there might be an alternative packing arrangement of a spumaretroviral Gag assembly.

Capsid formation and reverse transcription
Introduction of interface mutations V375Q and L410E/M413E or YxxLGL motif mutants [48] have little effect on virus assembly or RNA encapsidation. By contrast, dramatic effects are observed on the formation of morphologically intact cores, particle DNA content and infectivity. These seemingly incompatible data might be reconciled in the following way. It is known that initial FV capsid formation occurs within the cell cytoplasm and simultaneously viral RNA is recruited by Gag via the GR-regions [54]. Subsequently, FV Env leader peptide binds Gag to facilitate membrane targeting and particle release [45]. However, it has been demonstrated that cleavage of PFV p71-Gag to generate p68-Gag is required for the initiation of reverse transcription [66]. Furthermore, it has been shown that proteolytic processing of the Gag protein of S. cerevisiae Ty1 transposable elements that assemble in the cytoplasm is also required for reverse transcription and transposition activity [67,68]. Although we cannot rule out that in FVs Env binding to Gag might be a trigger to conformational rearrangement, we suggest that Gag cleavage to form p68, initiates the rearrangement of Gag, resulting in the appearance of the discrete capsid layer observed by cEM. The absence of viral DNA genomes in released mutant virions (Fig 7) implies that this Gag rearrangement and capsid shell formation is a requirement for one or more steps in reverse transcription and may be analogous to maturation in orthoretroviruses.

Capsid structure and restriction
Members of the Trim5α family of restriction factors block infection of cells by HIV-1, as well as other lentiviruses, gammaretroviruses and the FVs [31,69]. Orthoretrovirus restriction requires interaction of Trim5α with the CA component of Gag in the context of an assembled capsid shell [32,70] consistent with the genetic mapping within CA of the amino-acid determinants for restriction specificity [71,72]. It appears that CA-hexamers, the basic building block for core assembly, represent the primary target for Trim5α restriction [73,74] and a similar picture is emerging for Fv1 [75,76]. However, given the apparent lack of sequence identity between orthoretroviral and FV Gag proteins, it has been unclear how such restriction factors might recognise and restrict FVs. Indeed, the molecular determinants for Trim5α restriction of FVs seem to map to the N-terminal region of FV Gag [43]. Our structural analysis of PFV Gag now reveals that FVs also contain a CA region comprising two domains with folds related to the NtD CA and CtD CA of orthoretroviral Gag. This might suggest a similar mechanism for FV recognition by restriction factors where self-association of the central region through Gag CtD CEN interactions in combination with dimerisation through the Gag N-terminal region [43] could also form hexameric arrays that are targeted by Trim5α. More detailed structural studies will be required to answer this question.

Protein Expression and purification
The DNA sequences coding for PFV-Gag residues 300-477, 300-381 (NtD CEN ), and 381-477 (CtD CEN ) were amplified by PCR from template plasmid pcziGag4 [77] containing the PFV Gag gene. PCR products were inserted into a pET22b expression vector (Novagen) using the NdeI and XhoI restriction sites in order to produce C-terminal His-tag fusions. The correct sequence of expression constructs was verified by automated DNA sequencing (GATC Biotech). His-tagged PFV constructs were expressed in the E. coli strain Rosetta 2 (DE3) and purified using Ni-NTA affinity (Qiagen) and size exclusion chromatography (SEC) on Superdex 75 (GE healthcare). For NMR studies proteins were grown in minimal media supplemented with 15 NH 4 Cl, 13 C-Glucose and/or 2 H 2 O and purified as described.

Protein structure determination
The solution structures for PFV-Gag(300-477) and the PFV-Gag CtD CEN dimer were calculated using the program ARIA (Ambigious Restraints for Iterative Assignment v 2.3) [81]. Nine iterations of progressive assignment and structure calculation combined with NOE distance restraints, hydrogen bonds, dihedral angle restraints, predicted by the TALOS program [82] and RDC measurements were employed in a simulated annealing protocol. For the PFV-Gag CtD CEN homodimer the inter-proton NOE-derived distance restraints present in the filtered NOESY experiments were defined as intermolecular and the corresponding NOEs removed from the 3D 13 C-NOESY-HSQC.
Initial structures were used to determine the axial and rhombic components of the alignment tensors with the program MODULE [83]. Subsequently, the RDC restraints were added in the final refinement stage of structure calculations. Only data for residues located in rigid secondary structure elements ( 1 H-15 N NOE >~0.75) were employed. A final ensemble of the 20 lowest energy structures derived from 100 calculated structures and refined in an explicit water box in the last iteration was selected. The superimposition of the 20 lowest-energy structures and the ribbon diagram of one representative PFV-Gag(300-477) and one PFV-Gag CtD CEN dimer structure are shown in S1A and S4A Figs. The quality of the calculated structure ensembles were assessed and validated with the Protein Structure Validation Suite-PSVS [84] and Procheck-NMR [85]. For the final 20 lowest-energy NMR structures, no distance or torsional angle restraint was violated by more than 0.5 Å or 5°, respectively. Structure determination details are summarised in Table 1. 15

N Relaxation measurements
The backbone 15 N relaxation parameters of the spin-lattice relaxation time T 1 , the spin-spin relaxation time T 2 and the steady-state heteronuclear 1 H-15 N NOE relaxation were determined at 25°C on a 700 MHz spectrometer using a 15 N-labeled NMR samples for PFV-Gag(300-477). The time delays used for T 1 experiments were 10, 50, 100, 200, 400, 500, 750, 1000, and 1400 ms, and those for T 2 experiments were 8,16,32,48,64,80,96,112,128 and 160 ms. The T 1 and T 2 relaxation data were obtained by fitting the individual peak intensities using nonlinear spectral lineshape modelling and fitted to single exponential using routines within NMRPipe [79]. 1 H-15 N NOE values were calculated from peak intensity ratios obtained from spectra with and without 1 H saturation prior to the 15 N excitation pulse.

Structure alignment and comparisons
The protein structure comparison service (SSM) at the European Bioinformatics Institute (http://www.ebi.ac.uk/msd-srv/ssm/) was used to perform initial searches for structural homologues in the PDB. PFV-Gag NtD CEN and CtD CEN were superimposed upon orthoretroviral CA NtD and CtDs using SUPERPOSE [51] from the ccp4 program package. The fit qualities based on rmsd of Cα positions were ranked using the Q-score. Structural alignments were also produced using the SAP program [53] that uses a local structural environment based comparison that is less sensitive to local structural variation than the raw rmsd measure. The significance of the SAP comparisons were assessed using customized "decoy" models to provide a background of scores against which the comparison of the native domain structures could be evaluated [52]. A representative selection of five orthoretroviruses for which both NtD CA and CtD CA structures were available was used allowing a joint probability of their significance to be calculated for each domain pairing.

Analytical Ultracentrifugation
Sedimentation velocity experiments were performed in a Beckman Optima Xl-I analytical ultracentrifuge using conventional aluminium double sector centrepieces and sapphire windows. Solvent density and the protein partial specific volumes were determined as described [86]. Prior to centrifugation, samples were prepared by exhaustive dialysis against the buffer blank solution, 20 mM Tris-HCl pH 8, 150 mM NaCl and 0.5 mM TCEP (Tris Buffer). Centrifugation was performed at 50,000 rpm and 293 K in an An50-Ti rotor. Interference data were acquired at time intervals of 180 s at varying sample concentration (0.5-2.0 mg/ml). Data recorded from moving boundaries was analysed in terms of the size distribution functions C(S) using the program SEDFIT [87][88][89].
Sedimentation equilibrium experiments were performed in a Beckman Optima XL-I analytical ultracentrifuge using aluminium double sector centrepieces in an An-50 Ti rotor. Prior to centrifugation, samples were dialyzed exhaustively against the buffer blank (Tris Buffer). After centrifugation for 30 h, interference data was collected at 2 hourly intervals until no further change in the profiles was observed. The rotor speed was then increased and the procedure repeated. Data were collected on samples of different concentrations of PFV-Gag(300-477) and PFV-Gag CtD CEN at three speeds and the program SEDPHAT [90,91] was used to determine weight-averaged molecular masses by nonlinear fitting of individual multi-speed equilibrium profiles to a single-species ideal solution model. Inspection of these data revealed that the molecular mass of PFV-Gag(300-477) showed no significant concentration dependency and so global fitting incorporating the data from multiple speeds and multiple sample concentrations was applied to extract a final weight-averaged molecular mass. For PFV-Gag CtD CEN the molecular masses showed significant concentration dependency and so global fitting of a monomer-dimer equilibrium model incorporating the data from multiple speeds and multiple sample concentrations was applied to extract the dimerisation association constant (K A ).

Electron cryo-tomography and image analysis
PFV Wild type and the Gag central domain mutants were examined by cryo-electron tomography. In summary, 2 μL stock virus solution was mixed with 10-nm gold particles (British-Biocell) diluted in buffer PBS and the total 2.5 μL solution was applied to amylamine glowdischarged 200 mesh copper Quantifoil (R2/2) grids in the environment chamber (4°C, 100% RH) of a Vitrobot Mark III (FEI), blotted on both sides with a double layer of paper for 4 seconds before plunging into liquid ethane. The frozen grids were transferred to a Gatan 626 cryo tomography holder and inserted into the FEI Spirit TWIN microscope operated at 120keV with a tungsten filament source. Images were recorded unbinned at a nominal magnification of 30,000(7Å/pixel) on a 2Kx2K Eagle CCD camera at -2.5 μm defocus. Tilt series for tomography were recorded automatically using Serial EM from 0 to ±60°in 2°steps, typically with a total dose less than 70 e -/Å 2 . Tomographic tilt series were aligned using IMOD software [92]. Alignment initially used cross-correlation and then used gold particles as fiducials. Reconstructed 3D volumes were generated by back-projection as well as SIRT method. For better visualization, individual virus particles were extracted from the whole tomograms and 50Å thick sections are shown in Fig 6.

Transfection and virus production
Cell culture supernatants containing recombinant viral particles were generated by transfection of the corresponding plasmids into 293T cells using polyethyleneimine (PEI) as described previously [66,96]. For subsequent Western blot analysis the supernatant generated by transient transfection was harvested, passed through a 0.45-μm filter and centrifuged at 4°C and 25,000 rpm for 3 h in a SW32Ti rotor (Beckman) through a 20% sucrose cushion. The particulate material was resuspended in phosphate-buffered saline (PBS). For cryo electron microscopy analysis viral particles were produced in serum-free medium and a further concentration step using Amicon Ultra 0.5 ml 100K Concentrators was included following the first concentration by ultracentrifugation through 20% sucrose similar as described recently [54].

Infectivity analysis
Transduction efficiency of recombinant, eGFP-expressing PFV vector particles by fluorescence marker-gene transfer assay was analyzed 72 h post-transduction as described previously [54,95,97]. All transduction experiments were performed at least twice. In each independent experiment the values obtained with the wt construct pcoPG4 were arbitrarily set to 100% and values obtained with other constructs were normalized as a percentage of the wt values.

Western blot analysis
Cells from a single transfected 100 mm cell culture dish were lysed in detergent-containing buffer and the lysates were subsequently centrifuged through a QIAshredder column (QIAGEN). Protein samples from cellular lysates or purified particulate material were separated by SDS-PAGE on a 10% polyacrylamide gel and analyzed by immunoblotting as described previously [98]. Polyclonal rabbit antisera specific for PFV Gag [99] or residues1 to 86 of the PFV Env leader peptide (LP), [98] as well as hybridoma supernatants specific for PFV PR-RT (clone 15E10) or PFV integrase (IN) (clone 3E11) [100] were employed. After incubation with species-matched horseradish peroxidase (HRP)-conjugated secondary antibody, the blots were developed with Immobilon Western HRP substrate. The chemiluminescence signal was digitally recorded using a LAS3000 (Fujifilm) imager and quantified using ImageGauge (Fujifilm).

Quantitative PCR analysis
Preparation of particle and cellular samples for qPCR analysis was performed as previously described [54,96]. Primers, Taqman probes and cycling conditions for specific quantification of PFV genome are summarized in (S3 Table). All sample values obtained using a StepOne Plus (Applied Biosystems) qPCR machine were referred to a standard curve consisting of 10-fold serial dilutions of respective reference plasmid (puc2MD9) containing the target sequences. All sample values included were in the linear range of the standard curves with a span from 10 to 10 9 copies. The values for the DNA or RNA content of viral particle samples obtained by the qPCR analysis were normalized for Gag content determined by quantitative WB as indicated above and are expressed as percentage of the wt (generated by transfection of cells with pcoPG4, pcoPP, pcoPE and puc2MD9). The highly conserved PGQA and YxxLGL motifs are highlighted in blue and green respectively and residues at the homodimer interface (helices α5 and α6) are highlighted in red. (B) PFV-Gag CTD CEN monomer structure. The monomer is shown in surface representation with secondary structure depicted as a ribbon. Helices α5 -α6 that form the homodimer interface in the structure are shown in red. The PGQA and YxxLGL conserved motifs that combine to form the hydrophobic patch are coloured in blue and green respectively. (TIF) S1 Table. SSM superpose scores for structural alignments (PDF) S2 Table. Quantitation of viral cores (PDF) S3 Table. qPCR primer/probe set (PDF)