Persistent Infection and Promiscuous Recombination of Multiple Genotypes of an RNA Virus within a Single Host Generate Extensive Diversity

Recombination and reassortment of viral genomes are major processes contributing to the creation of new, emerging viruses. These processes are especially significant in long-term persistent infections where multiple viral genotypes co-replicate in a single host, generating abundant genotypic variants, some of which may possess novel host-colonizing and pathogenicity traits. In some plants, successive vegetative propagation of infected tissues and introduction of new genotypes of a virus by vector transmission allows for viral populations to increase in complexity for hundreds of years allowing co-replication and subsequent recombination of the multiple viral genotypes. Using a resequencing microarray, we examined a persistent infection by a Citrus tristeza virus (CTV) complex in citrus, a vegetatively propagated, globally important fruit crop, and found that the complex comprised three major and a number of minor genotypes. Subsequent deep sequencing analysis of the viral population confirmed the presence of the three major CTV genotypes and, in addition, revealed that the minor genotypes consisted of an extraordinarily large number of genetic variants generated by promiscuous recombination between the major genotypes. Further analysis provided evidence that some of the recombinants underwent subsequent divergence, further increasing the genotypic complexity. These data demonstrate that persistent infection of multiple viral genotypes within a host organism is sufficient to drive the large-scale production of viral genetic variants that may evolve into new and emerging viruses.

T30-like CTV genomes. Together, the four sets of primers were capable of amplifying all known CTV genomes as four DNA fragments ranging from 4.5 to 5.5 kb. Total RNA was extracted from a sample (1 g) of CTV-infected tissue using the Trizol reagent (Invitrogen, Carlsbad, CA) as described by the manufacturer. Reverse transcription was carried out at 42 o C for 90 minutes using the ImProm II reverse transcriptase (Promega, Madison, WI) with the protocol provided by the manufacturer. CTV genomic fragments were then amplified by 35 cycles of long range PCR

RT-PCR amplification, cloning, and sequencing of CTV genome fragments
The 5' fragments (1 kb) of the CTV genomes were amplified using the three 5' PCR primers and a 3' conserved universal primer, CTV942R, complementary to a highly conserved sequence located approximately 1 kb downstream from the 5' end . CTV942R was used as both the RT and the PCR primer. The 1 kb fragments containing the p33 ORF near the middle of CTV genome were amplified using a set of universal RT-PCR primers capable of amplifying all known CTV genotypes non-selectively. The set comprised an RT primer, CTV12124R, complementary to nucleotides 12124 to 12105, a 5' PCR primer, CTV10834F, identical to nucleotides 10834 to 10853, and a 3' PCR primer, CTV11815R, complementary to nucleotides 11815 to 11794 of the aligned CTV genomes. Reverse transcription of CTV genomic cDNA was carried out using the ImProm II reverse transcriptase at 42 o C for 1 hour and amplification of the DNA fragments was carried out by 35 cycles of PCR using Taq DNA polymerase (Promega, Madison, WI). Each PCR cycle contained a 45-second denaturation at 94 o C, a 1-minute annealing at 46 o C, and a 1minute polymerization at 72 o C with a 2-second increment after each cycle. A 2-minute denaturation at 94 o C was programmed at the beginning and a 10-minute polymerization at 72 o C at the end.
PCR products were purified using the Qiagen MinElute PCR Purification Kit (Qiagen, Valencia, CA), and were employed for TA-cloning using a pUC18-based vector containing twin Xcm I restriction sites [6]. Clones with inserts were identified by colony-PCR screening using the M13F and M13R primers that annealed to sequences upstream and downstream of the multiple cloning sites in the vector. Plasmid DNA molecules from randomly selected clones were purified and sequenced in both directions using an ABI 3730XL DNA Analyzer at the Genomic Analysis and Technology Core Facility at the University of Arizona.

Design of CTV resequencing microarray
Genomic sequences of representative CTV isolates were selected for microarray tiling based on the tiling capacity of the resequencing microarray. Nine full-length CTV genomes (T30, T36, VT, SY568, T385, NUagA, Qaha, T3, and H33) available at the time were phylogenetically analyzed using the ClustalX program [7]. The T68 genomic sequence of 13.3 kb was incomplete, and therefore was not included in this analysis. Four apparent clades of CTV isolates, identified by this and other analyses [8,9], guided the selection of a representative genomic sequence from each clade: T36 representing the clade of T36 and Qaha isolates; T30 representing the clade of T30, T385, and SY568 isolates; VT representing the clade containing VT, NUagA, and H33 isolates; and T3 representing the singleton T3 clade. Full-length genomic sequences of T36, T30, VT, and T3 were fully tiled on the resequencing microarray. Although T68 was not included in the original phylogenetic analysis, subsequent analysis of the 5' 3.5 kb sequence of T68 with those of other isolates clearly established T68 as being uniquely different from the four CTV clades. Consequently, all 13,585 available nucleotides of the T68 genome were also completely tiled on the microarray.
Genomes from the remaining CTV isolates in each clade were then compared to that of the tiled, representative isolate to identify unique sequences of the genomes for additional microarray tiling. A unique sequence was defined as a 25 nucleotide region that contained either a run of two or more different nucleotides at the central (13 th nucleotide) position, or two or more non-contiguous, different nucleotides within eight nucleotides of the 13 th nucleotide. These parameters were dictated by the fact that a run of centrally located, mismatched nucleotides or closely-spaced mismatched nucleotides would significantly destabilize the hybridization between the labeled target DNA and the oligonucleotide probe on the microarray. Unique nucleotide sequences identified by this program and selected full genomic sequences of CTV isolates were then tiled on the GeneChip CustomSeq resequencing array with an 8-µm feature size using the Affymetrix photolithographic manufacturing process (Affymetrix, Santa Clara, CA). The complete tiling of 117,088 nucleotides comprises four full-length CTV genomes, one partial genome, and unique sequences from other CTV isolates. In addition, 807 nucleotides from an artificial cDNA clone were included as an internal control. For each nucleotide tiled on the array, four 25-mer oligonucleotides (a quartet) corresponding to the sense strand and four oligonucleotides representing the antisense strand were tiled on the microarray. Thus, the microarray contains a total of 943,160 25-mer oligonucleotide probes.

Microarray hybridization and base-calling
Amplified PCR fragments were cleaned using the Qiagen MinElute PCR Purification Kit 4.0 to retrieve sequence information. Base calls were made using the ABACUS algorithm [10] with a haploid model. The ABACUS parameters were as follows: no signal threshold = 20, weak signal fold threshold = 20, maximum signal to noise ratio = 20, quality score threshold = 3.0, base reliability threshold across samples = 0, trace threshold = 1, and sequence profile threshold = -0.175.

Contig assembly of sequence fragments generated by resequencing analysis
The output from the GSEQ consisted of 252 sequence fragments, corresponding to the fulllength genome and sequence fragments tiled on the microarray. Each nucleotide in these fragments was assigned a quality score by GSEQ, on the basis of differential hybridization of perfect-matches and mismatches in the sense and antisense quartets as well as on the basis of hybridization characteristics of the neighboring nucleotides. The sequence fragments and the associated quality scores were converted into fasta-format files, and were used to assemble full and partial CTV genomic contigs using the Phrap program [11] implemented in the CodonCode Aligner (CodonCode, Dedham, MA). After experimenting with a wide range of parameters, the following Phrap command parameters produced reliable and reproducible alignments and consensus contigs that reflected the true nature of the genotype(s) in samples: -penalty -2minmatch 10 -maxmatch 10 -minscore 12 -vector_bound 0 -masklevel 100 -trim_start 0 -trim_qual 4 -forcelevel 0.

Bayesian phylogenetic and recombination analysis
Sequences of CTV genomes or genomic fragments were initially aligned using the default parameters of the ClustalX program [7], followed by visual inspection and manual alignment as required. Bayesian inference of phylogenetic relationships was carried out using MrBayes 3.12 [12], using the general time-reversal model with gamma-shaped rate variation and a proportion of invariable sites (GTR+I+G). This model was determined as the best-fit model for phylogenetic inferences of the input sequences using the ModelTest program [13]. Two parallel runs of Metropolis-coupled Markov chain Monte Carlo simulation of one cold and three heated chains were then carried out for at least 5,000,000 generations, or until the average standard deviation of split frequencies reached 0.01. One tree was sampled for every 200 generations during the simulation. A consensus tree was constructed from the sampled trees after a burn-in of two-fifth of the sample trees. The Bayesian posterior probability for each node was calculated as the proportion of sampled trees containing the node. With a few exceptions on minor nodes, the posterior probability on all major nodes approached 1.00 after the extensive phylogenetic analysis. Phylogenetic trees were then visualized using the TreeView program [14]. Cross-over junctions of the recombinant molecules were determined initially by RDP2, a recombination detection program that deploys ten published methods to detect recombinant sequences and recombination breakpoints [15] and subsequently confirmed by visually inspection .