Identification of FAM111A as an SV40 Host Range Restriction and Adenovirus Helper Factor

The small genome of polyomaviruses encodes a limited number of proteins that are highly dependent on interactions with host cell proteins for efficient viral replication. The SV40 large T antigen (LT) contains several discrete functional domains including the LXCXE or RB-binding motif, the DNA binding and helicase domains that contribute to the viral life cycle. In addition, the LT C-terminal region contains the host range and adenovirus helper functions required for lytic infection in certain restrictive cell types. To understand how LT affects the host cell to facilitate viral replication, we expressed full-length or functional domains of LT in cells, identified interacting host proteins and carried out expression profiling. LT perturbed the expression of p53 target genes and subsets of cell-cycle dependent genes regulated by the DREAM and the B-Myb-MuvB complexes. Affinity purification of LT followed by mass spectrometry revealed a specific interaction between the LT C-terminal region and FAM111A, a previously uncharacterized protein. Depletion of FAM111A recapitulated the effects of heterologous expression of the LT C-terminal region, including increased viral gene expression and lytic infection of SV40 host range mutants and adenovirus replication in restrictive cells. FAM111A functions as a host range restriction factor that is specifically targeted by SV40 LT.


Supplementary Figure Legends
pp 10-18 Table S2 pp [19][20]       were pulse-labeled with BrdU, harvested at the indicated time points and subjected to FACS analyses to determine their cell cycle profile using PI labeling of DNA and anti-BrdU staining as described in [1].

Figure S1
HA Tubulin Mock (pVAX) wtSV40 + pVAX HR684 + pVAX HR684 + pVAX-T Ag 627-708 Mock (pVAX) wtSV40 + pVAX HR684 + pVAX HR684 + pVAX-T Ag 627-708 T98G HeLa     Table Legends   Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT. Cluster membership, adjusted p-values and fold changes are annotated for each probeset. Table S2. List of all enriched GO terms for each cluster ( Figure 2B), along with their adjusted p-values and log odds ratios.   Table S4. iTRAQ-based quantitative analysis of full-length and sub-genomic fragments of SV40 LT. The numbers correspond to the relative enrichment factor measured for proteins detected in experimental and control immunoprecipitations. Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT.

Probeset IDs
Gene symbols  Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT.    Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT.   Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT.   Table S1. List of all probesets (514), mapping to 430 unique genes, that were significant in two or more comparisons across conditions, or were differentially expressed by any fragment of SV40 LT.  Table S2. List of all enriched GO terms for each cluster (Figure 2B), along with their adjusted p-values and log odds ratios.

Log Odds Ra3o
Adjusted    Table S3. Summary of MudPIT proteomic analysis of SV40 LT-interacting proteins Table S4. iTRAQ-based quantitative analysis of full-length and sub-genomic fragments of SV40 LT. The numbers correspond to the relative enrichment factor measured for proteins detected in experimental and control IPs.  Table S4. iTRAQ-based quantitative analysis of full-length and sub-genomic fragments of SV40 LT. The numbers correspond to the relative enrichment factor measured for proteins detected in experimental and control IPs.  Table S4. iTRAQ-based quantitative analysis of full-length and sub-genomic fragments of SV40 LT. The numbers correspond to the relative enrichment factor measured for proteins detected in experimental and control IPs.  Table S4. iTRAQ-based quantitative analysis of full-length and sub-genomic fragments of SV40 LT. The numbers correspond to the relative enrichment factor measured for proteins detected in experimental and control IPs. residues of LT and the addition of 8 missense residues [4].
Quantitative RT-PCR. Cells were harvested in TriZOL (Invitrogen) and RNA was extracted. The RNA was then purified using an RNEasy kit (Qiagen) and cDNA was made using the SuperScript III kit (Invitrogen), both according to the manufacturers' protocols. Real-time PCR quantification of FAM111A depletion was performed in triplicate using a TaqMan probe/primer set specific to FAM111A (Applied Biosystems) and a Stratagene Mx3005P instrument. Beta-actin was used as an internal reference standard (Applied Biosystems product number 4310881E). Large Scale Immunoprecipitation. Cells were rinsed with PBS containing protease and phosphatase inhibitors (Calbiochem) and collected by scraping from the plates followed by centrifugation at 1,000 x g for 5 minutes. Cell pellets were frozen on dry ice and stored at -80°C until analysis. For MudPIT analysis cell pellets were thawed on ice and extracted using ice-cold extracting buffer (50 mM TRIS-HCl pH 8.0, 150 mM NaCl, 0.5% NP-40, 0.5 mM EDTA) supplemented with protease and phosphatase inhibitors.
Extracts were clarified by centrifugation at 20,000 x g for 30 minutes and used for large scale immunoprecipitations as described [6,7]. For iTRAQ analysis cells were processed as previously described [8].
Quantitative LC/MS methods. Mock and LT IP were processed in parallel as follow: Proteins in the FLAG elutions were denatured with 0.1% RapiGest (WATERS), cysteines were reduced with 10 mM DTT for 30 minutes at 56°C and proteins subsequently digested overnight at 37°C using 5 µg of trypsin. Tryptic peptides were desalted on C18 reverse phase chromatography in a batch mode, eluted with 50 µl of 40% acetonitrile / 0.1% TFA and dried down by vacuum centrifugation. Cysteine-containing peptides were captured on thiol-activated sepharose 4B beads (20 µl packed volume) for 1 hour at RT in a volume of 40 µl of 0.5 M triethyl-ammonium bicarbonate (TEAB). After two washes using 100 µl of 0.5 M TEAB, beads were re-suspended in 40 µl of 0.5 M TEAB and labeled using 70 µl of iTRAQ reagent for 1 hour at RT with agitation. After two washes using 100 µl of 0.5 M TEAB, beads from the mock and LT samples were combined, captured peptides were eluted using 40 µl of 10 mM DTT and immediately alkylated with 20 mM iodoacetamide. Peptides were separated using a 0 to 40% acetonitrile gradient on an inline nano-acquity system (WATERS) and analyzed on a QSTAR Elite mass spectrometer (Applied Biosystems) using a dual scan method [9]. A Digital PicoView ESI source (New Objective, Woburn, MA) was used to facilitate positioning of the emitter tip at the orifice of the mass spectrometer during each analysis. Data files were extracted using ProteinPilot software (Applied Biosystems). Peptide sequences and quantitative information (iTRAQ reporter ion peak area) were independently retrieved and merged using in-house multiplierz script. iTRAQ ratio for all peptides with a score above 10 were summed for each protein and the log2 of the resulting value calculated. alkylated with 10 mM IAM (Iodoacetamide, Sigma). As described in [10], a two-step digestion procedure was used. Endoproteinase Lys-C (Roche) was added to 0.5 µg for at least 6 hours at 37°C, then the sample was diluted to 2 M urea with 100 mM Tris-HCl, pH 8.5. Calcium chloride was added to 2 mM and the digestion with 0.5 µg trypsin (Promega) was let to proceed overnight at 37°C while shaking. The reaction was quenched by adding formic acid to 5% and the peptide mixture was loaded onto a 100

Multidimensional protein identification technology (MudPIT
µm fused silica microcapillary column packed with 8 cm of reverse phase material (Aqua, Phenomenex), followed with 3 cm of 5-µm Strong Cation Exchange material (Partisphere SCX, Whatman), followed by 2 cm of 5-µm C 18 reverse phase [11]. The loaded microcapillary column was placed in-line with a Quaternary Agilent 1100 series HPLC pump. Overflow tubing was used to decrease the flow rate from 0.1 ml/min to about 200-300 nl/min. Fully automated 10 step chromatography runs were carried out [12]. Three different elution buffers were used: 5% acetonitrile, 0.1% formic acid (Buffer A); 80% acetonitrile, 0.1% formic acid (Buffer B); and 0.5 M ammonium acetate, 5% acetonitrile, 0.1% formic acid (Buffer C). Peptides were sequentially eluted from the SCX resin to the reverse phase resin by increasing salt steps, followed by an organic gradient. The last two chromatography steps consisted in a high salt wash with 100% Buffer C followed by the acetonitrile gradient. The application of a 2.5 kV distal voltage electrosprayed the eluting peptides directly into a LTQ linear ion trap mass spectrometer equipped with a nano-LC electrospray ionization source (ThermoFinnigan). Full MS spectra were recorded on the peptides over a 400 to 1,600 m/z range, followed by five tandem mass (MS/MS) events sequentially generated in a data-dependent manner on the first to fifth most intense ions selected from the full MS spectrum (at 35% collision energy). Mass spectrometer scan functions and HPLC solvent gradients were controlled by the Xcalibur data system (ThermoFinnigan). SEQUEST [13] was used to match MS/MS spectra to peptides in a database of 61437 amino acid sequences, consisting of 35742 Human proteins (non-redundant entries from NCBI 2008-03-04 release), 177 usual contaminants such as human keratins, IgGs, and proteolytic enzymes and to estimate false discovery rates (FDR), 30723 randomized sequences (keeping the same amino acid composition and length) for each non-redundant protein entry. The validity of peptide/spectrum matches was assessed using the SEQUEST-defined parameters, cross-correlation score (XCorr) and normalized difference in cross-correlation scores (DeltCn). Spectra/peptide matches were only retained if they had a DeltCn of at least 0.08 and, minimum XCorr of 1.8 for singly-, 2.5 for doubly-, and 3.5 for triply charged spectra. In addition, the peptides had to be fully-tryptic and at least 7 amino acids long. Combining all runs, proteins had to be detected by at least 2 such peptides, or 1 peptide with 2 independent spectra. Proteins that were subset of others were removed. DTASelect/CONTRAST [14] was used to select, sort and compare peptide/spectrum matches passing this criteria set. Under these criteria, the %FDR ranges from 0 to 1.81. To estimate relative protein levels, spectral counts were normalized [15,16]: for each non-redundant protein k detected in a particular MudPIT analysis, Normalized Spectral Abundance Factors (NSAFs) were calculated as follow : To further refine spectral counting, a new algorithm was implemented on this dataset to deal with peptides shared between multiple proteins. Spectral counts for peptides shared between proteins are counted only once, and distributed according to the spectral count contribution of peptides unique to each isoform (as a way to estimate the relative proportion between isoforms). NSAF are then calculated based on distributed spectral counts (dSpC) with shared spectral counts distributed amongst protein isoforms [17].
Yeast two hybrid analysis. Full length or fragments of LT and FAM111A were tested against each other in a yeast two-hybrid matrix-style experiment in both directions as either GAL4 DNA-binding domain or GAL4 transactivation domain fusion proteins and performed as previously described [18][19][20][21].