Phage Morphology Recapitulates Phylogeny: The Comparative Genomics of a New Group of Myoviruses

Among dsDNA tailed bacteriophages (Caudovirales), members of the Myoviridae family have the most sophisticated virion design that includes a complex contractile tail structure. The Myoviridae generally have larger genomes than the other phage families. Relatively few “dwarf” myoviruses, those with a genome size of less than 50 kb such as those of the Mu group, have been analyzed in extenso. Here we report on the genome sequencing and morphological characterization of a new group of such phages that infect a diverse range of Proteobacteria, namely Aeromonas salmonicida phage 56, Vibrio cholerae phages 138 and CP-T1, Bdellovibrio phage φ1422, and Pectobacterium carotovorum phage ZF40. This group of dwarf myoviruses shares an identical virion morphology, characterized by usually short contractile tails, and have genome sizes of approximately 45 kb. Although their genome sequences are variable in their lysogeny, replication, and host adaption modules, presumably reflecting differing lifestyles and hosts, their structural and morphogenesis modules have been evolutionarily constrained by their virion morphology. Comparative genomic analysis reveals that these phages, along with related prophage genomes, form a new coherent group within the Myoviridae. The results presented in this communication support the hypothesis that the diversity of phages may be more structured than generally believed and that the innumerable phages in the biosphere all belong to discrete lineages or families.


Introduction
As all viruses, phages are classified by the International Committee on Taxonomy of Viruses according to their morphology and nucleic acid composition.The double-stranded DNA tailed phages, or Caudovirales, account for 96% of all the phages observed [1] and they belong to three families, Myoviridae, Siphoviridae, and Podoviridae.Members of the Myoviridae, such as the classical and well-studied phage T4, have a characteristic contractile tail structure.The myoviruses are currently further divided into 3 subfamilies, namely Peduovirinae, Spounavirinae, and Tevenvirinae; each of these subfamilies contains two genera.There are 11 other genera within the Myoviridae recognized by the ICTV [2], but these have not yet been assigned to subfamilies.The overwhelming majority of myoviruses remain unclassified because of insufficient data.
Recently, we have described two unassigned small myoviruses that were isolated on the little-studied gram-negative bacterial genera Iodobacter and Bdellovibrio (Table 1).The Iodobacteriophage wPLPE had a 47.5 kb genome, only about a quarter of the size of phage T4 whose genome size is 168 kb [3].Numerous phages with virion dimensions similar to those of wPLPE (isometric heads of 60-70 nm and with 65-85 nm contractile tails) have been isolated on hosts from diverse bacterial genera such as Aeromonas, Bdellovibrio, Bordetella, Pectobacterium, Vibrio, and Yersinia belonging to the b, c and d branches of the Proteobacteria.Such phages of the wPLPE-type are morphologically indistinguishable and, hence, they could constitute a widespread set of phylogenetically related myoviruses.
A few morphologically similar phages, but having longer tails, have also been reported in the genus Aggregatibacter.One of the latter (Aaw23) and a similar Yersinia phage (PY100) had been previously sequenced [4,5].By comparison, the smallest independent myovirus currently known is the Bdellovibriophage w1402 [6], although comparable to the archetype T4 myovirus in having an elongated head, this phage's virion's dimensions are only a half of those of T4 and its 24 kb genome is merely a seventh of the size of T4's and a half of that of wPLPE.
To continue our systematic analysis of small myoviruses, we decided to sequence and analyze the phylogenetic relationship of the diverse group of dwarf phages that all share a wPLPE-like virion morphology and are currently unassigned in the Myoviridae family.

Bacteria and Phages
Aeromonas salmonicida phage 56, Vibrio cholerae phages 138 (''group II'') and CP-T1, and their respective hosts are from the collection of the Fe ´lix d'He ´relle Reference Center for Bacterial Viruses (accession numbers HER 109, 52, and 373; www.phage.ulaval.ca).These phages were propagated for 3 h at 37uC in flasks containing 20 mL Trypticase Soy Broth and then filtered through 0.45 mm pore-size membranes.Bdellovibrio phage w1422 was a gift from Dr. B. Fane from the University of Arizona at Tucson.Pectobacterium carotovorum (formerly Erwinia carotovora) phage ZF40 had been isolated and characterized in Kiev [7].

Electron Microscopy
Phages 56, 138, CP-T1, and w1422 were sedimented by centrifugation, washed, and stained as described earlier [6] and examined in a Philips EM 300 electron microscope using T4 tails as the magnification control.

DNA Extraction and Sequencing
The DNAs of phages 56, 138, CP-T1, and w1422 were extracted, precipitated and resuspended as described previously [6,8].The DNA of phage ZF40 was extracted in Kiev by a similar procedure.The resulting pure DNAs were used for bar-coded library construction and 454 pyrosequencing that was performed according to the manufacturer's instructions on a quarter picotiter plate of a GS-FLX sequencer (Roche) at the IBIS/Universite Ĺaval Plate-forme d'Analyses Ge ´nomiques.

Morphology
The diverse phages presented in Table 1 are morphologically related, but their capsids range in size from 55 to 75 nm and their tails from 62 to 115 nm.We have physically examined ourselves 14 of the phages in Table 1, namely 51, 56, 57, 60, w1422, Bal, wATCC, L1, CP-T1, 138, 13, 16, 24 and ZF40.The virion morphology characteristic of the wPLPE-like phages is illustrated by the micrographs in Figure 1.For example, the Aeromonas phage 56, like Iodobacter phage wPLPE, has an icosahedral capsid of <61 nm, a contractile tail of <81617 nm with no collar and short terminal fibers of about 10 nm in length.These dimensions are average values from over 150 observed virions.The tails of both phages exhibit faint cross striations or, less frequently, a crisscrossed pattern.The contracted sheaths measure about 37620 nm and contraction separates the sheath from the base plate which then appears as a distinct thin disk of <1762 nm (not shown).A few minor morphological variations do exist in the Aggregatibacter phages, such as Aaw23, and in the Bdellovibrio phage w1422 (Table 1).Although otherwise having identical dimensions to the other wPLPE-like phages, the former has a distinctly longer contractile tail structure of <112 nm and the latter has a slightly prolate capsid of 68640 nm.

Genome Analyses
Among the phages of the wPLPE group that we have sequenced, they all are within the size-range of 43.5 to 48.5 kb, with phage 56 being the smallest and phage ZF40 the largest (Table 2 and Fig. 2).All five genomes assembled as circular contigs indicating that they are circularly permuted and have direct terminal repeats [11].GC contents ranged from 43-55% and most of them had only slightly lower GC levels than the genomes of their hosts; the only significant exception was the Bdellovibrio phage w1422 which was seven percentage points lower, perhaps indicating a more recent phage-host association.
Curiously, the smallest phage genome actually had the most ORFs (.90 nt in size; consensus between GeneMark and GLIM-

Gene and Protein Functions
Considering their gene content, many of the wPLPE-like phages share similar protein functions in their lysogeny/replication modules (Fig. 2 and Table 2), but these are often encoded by analogs, not homologs, and hence are not included among the shared ORFs compilation in Figure 3. Detailed annotations of the newly sequenced wPLPE-like phage genomes are presented in Tables S1, S2, S3, S5, S5.Some l-like proteins involved in recombination and replication are shared by the wPLPE group; however, there are noteworthy exceptions.For example, the two vibriophages have B family DNA polymerases, whereas ZF40 has the b subunit of DNA polymerase I/II which, interestingly, is located upstream of a transposase.Three of the wPLPE-like phages (wPLPE, CP-T1 and 138) have tetR-like cellular transcriptional regulators, whereas w1422 has a s 54 -type regulator.Both types of regulators are implicated in bacterial response to osmotic stress [13,14] and it is plausible that these bacterial genes have been co-opted by the phages for their own purposes.The trio of phages wPLPE, CP-T1 and 138 also share extended GC-rich repeat sequences (Table 3) that we speculated [3] could be a novel type of attachment site employed for lysogenic integration.
Considering their similar morphology, it is hardly surprising that the wPLPE-like phages all have a related structuralmorphogenesis module (Figs. 2 and 3).The gene order of this structural module is well conserved: terminase -portal -headtail -base-plate -tail fibers.There are two structural module subcomponents that are universally conserved in these phages (Fig. 4); one includes the terminase, portal and head proteins (with the exception of the placement of the terS/L in PY100).The other is near the end of the genome and most likely encodes the tail base-plate and tail fibers genes.
Finally, although many of the phages infect hosts that are human (Vibrio), plant (Pectobacterium) or animal (Aeromonas) pathogens, no identifiable toxins or virulence factors have been detected in any of these phage genomes.

Lytic vs. Temperate Lifestyles
All of the known temperate phages employ one of only three different systems for their lysogenic cycle: lambda-like integration/ excision [15], Mu-like transposition [16] or plasmid-like partitioning of N15 [17].With respect to the wPLPE group, their genomes possess a varying complement of lysogenic genes (Fig. 2, Tables 1  and 2).The phages Aaw23 and ZF40 are known to be temperate and they have all of the classical lambda-like lysogeny genes  2 and gene numbers refer to the wPLPE genome.ORF60 (''600) is of unknown function, but could be implicated in the baseplate (BP).The two small ORFs upstream of the terL genes in w1422 and ZF40 could be the terS genes.The terS/L genes in PY100 are not arranged as in the other phages and are far upstream and not side-by-side.All of the cellular hits shown are (conserved) hypothetical bacterial proteins, except for w1422 which has a s 54 transcription regulator (''s'').doi:10.1371/journal.pone.0040102.g004(integrase, excisionase, repressor and antirepressor(s)).Moreover, the databases contain prophage sequences similar to both of these phages.Phage ZF40 only lacks an obvious excisionase, but these proteins are usually small, variable and can be difficult to identify by homology -perhaps one of the small proteins just downstream of the ZF40 integrase is a novel excisionase.Curiously, ZF40 also has a transposase, raising the possibility of a Mu-like transposition mechanism.Although this phage could have alternative lysogeny pathways, it seems more likely that the transposase was acquired by a random horizontal transfer and is not involved in lysogeny.Phage CP-T1 is the only other wPLPE-like phage known to be capable of temperate behavior [18,19], but the genome has none of the genes required for any of the three known mechanisms of lysogeny mentioned above.Either this genome carries a novel set of genes that can assure a (pseudo-)lysogenic response that operates via a different mechanism than classical lysogeny or the Fe ´lix d'He ´relle Reference Center conserves a virulent mutant that has a spontaneous deletion of its lysogeny cassette.The phages PY100 and 138 are lytic phages and as expected their genomes carry none of the genes known to be involved in lysogeny.Phages 56 and wPLPE are also lytic, but both genomes contain a lambdalike anti-repressor sequence.This could either be the consequence of the random horizontal transfer of a lysogeny gene or, perhaps, the residue of a previously functional lysogeny cassette that has been largely deleted.Finally, the lytic w1422 has a ParB homolog (gp42), implying either (as above) a horizontal acquisition or the possibility of a cryptic N15 plasmid-like partitioning system that has not manifested itself under the growth conditions we have employed.

Phylogeny
In view of the strong morphological conservation among the wPLPE-like phages, it was somewhat surprising that their TerL large-subunit terminase sequences did not produce a simple and coherent phylogeny for the wPLPE group, suggesting instead that the group is polyphyletic (Fig. 5A).This non-structural gene, responsible for packaging the DNA into the capsid, has often been successfully employed as a phylogenetic marker gene for other phage groups [20,21].However, an often-used alternative marker, the portal protein that connects the phage capsid to the tail [22,23], gave a much more coherent phylogeny, with the majority of the wPLPE-like phages forming a monophyletic group (Fig. 5B).It is perhaps relevant that while the terminase function is not a structural constituent of the virion, the portal protein is a central part of it.To obtain a more global phylogenetic overview of the relationships between the different wPLPE phages, we have employed genomic dot-plots of these genomes sequences against each other (Fig. 6A).As controls, we have included representatives of the two other well-described groups of small myoviruses: P2 along with its close relative wCTX; and Mu along with its similarly close relative BcepMu.The dot-plot technique has been useful previously, especially to reveal weak sequence conservation between phage genomes that have diverged significantly from a distant common ancestor [24].For example, for the large and extremely diverse T4 phage group, the virion structural module is visible as faint interrupted diagonal lines in plots of T4 against even some of the most distant members of this group [25].This method has been successful in detecting distant phylogenetic relationships because both the sequence and synteny of virion structural genes are the most evolutionarily conserved features of phage genomes.Such genomic dot-plots reveal that only CP-T1 and 138, both replicating on the same host, share extended regions of nucleotide sequence homology.However, quantitative similarity analyses analysis of such data (Fig. 6B) clearly demonstrate that all the wPLPE-like phages are related and are significantly different (p,0.01)from either the Mu or P2 groups (similarities higher within the group than with the outsiders).Translating the genomic DNA sequences into a fusion polyprotein generally significantly improves detection levels due to the degeneracy of the genetic code.Consequently, the protein dot-plots reveal additional homologies among the members of the wPLPE group; for example, the broken diagonals for wPLPE (Fig. 6A).Nevertheless, this analysis of phages Aaw23 through PY100 still does not reveal significant regions of substantial homology, having only a few homology ''hotspots'' here and there.In the control sequences, the pair Mu/BcepMu are fairly weak as well, whereas P2/wCTX show clearly visible diagonals.Our quantitative similarity analysis of the genome polyprotein sequences (Fig. 6B) convincingly demonstrates that the wPLPE-like phages are significantly different (p,0.01)from both the Mu and P2 groups.However, both Mu/ BcepMu and P2/wCTX pairs show significant similarity within their respective groups, but there appears to have been some genetic blending between these two groups.These different types of polyprotein analyses are largely consistent with the conclusions of the BLAST analyses of the individual proteins presented in Tables S1, S2, S3, S4, S5.

Conclusions
Dwarf myoviruses such as those described here have been much less studied from a genomic standpoint than their bigger cousins, with only two groups currently recognized on the NCBI Genome site: the P2-like phages (Peduovirinae [2]) representing 20 phage genomes of <31-41 kb and the Mu-like phages representing only three genomes of about 37 kb.Employing the approach Lavigne et al. [2,9] used to update the Myoviridae and Podoviridae taxonomies, we tallied the shared protein sequences between the phages with a wPLPE-like morphology to the wPLPE reference genome (Table 2).The percentages of shared proteins over the entire genomes range from 11-35%; but, restricting the comparison to just the structure/morphogenesis modules and ignoring the more variable functions in the left-hand part of these genomes, the percentages essentially double to 19-60%.This data, coupled with our genome/protein similarity analyses (Fig. 6B), demonstrate that the wPLPE-like phages constitute a varied yet coherent set of phages that is clearly distinct from the other described myovirus types.This group's unifying characteristics will probably become more evident and expand as additional related genomes are sequenced.Hopefully, for example, more details will emerge regarding the replisomes and lysis/lysogeny controls in these phages which are much more variable, perhaps to facilitate their adaptation to a wide variety of lifestyles, ecological niches and hosts.
Finally, these phages give us another example of a phylogenomic trend that is becoming increasingly evident as ever larger numbers of diverse phage genomes are sequencedthe core genomes of many groups seem to be built around a phylogenetically conserved virion module encoded by a coherent and largely fixed set of structural genes whose sequences have been mutually constrained during their evolution.We suggest that, as in the case of the T4 phage group, the wPLPE virion's structural module has been subject to severe constraint of having to maintain a set of strong protein-protein interactions between the diverse virion components to insure a robust virion structure [26,27].Eons of Darwinian selection seem to have yielded only a limited number of successful virion structural modules that have the ability to easily adapt to new and varied ecological niches.This appears to have lead to an evolutionary scenario where the virion constituents have become relatively fixed while the other, mostly enzymatic, viral functions have been comparatively free to adapt to the requirements of their ever changing environment.One surprising direct consequence of this scenario has been that in spite of the enormous recent progress in phylogenomic analysis of phage diversity, a morphological classification seems to still be generally, although not perfectly, valid.For example, the divergent marine vibriophage VpV262 [28] and many cyanophages [29,30] have host-derived DNA polymerases and photosynthesis genes, respectively, yet their virion morphology and the phylogenomics of their core genomes unambiguously place them within either the T7-like Podoviridae or the T4-like Myoviridae.One critical question for future studies to address is: how many phage morphotypes are there -a manageable number or a hopeless diversity of them?Our view is that this number is much smaller than would have been previously estimated and that consequently a coherent genomics-based phylogeny of the phage virosphere, guided by virion morphology, is now a feasible objective.Another important question for the phage genomics community to answer is: how much of phage evolution (and the taxonomy derived from it) is driven by the evolution of their structure vs. their (enzymatic/regulatory) function(s)?

Figure 3 .
Figure 3. Bipartite nature of the wPLPE group phage genomes, with variable lysogeny/replication modules and conserved structure/morphogenesis modules.With the exception of wPLPE itself (all ORFs colored), only those ORFs shared with wPLPE in the other phages are color-coded as in Fig. 2. Shared ORFs were defined as protein matches in each phage against a wPLPE-restricted BLASTp with an E-value ,10 24 .doi:10.1371/journal.pone.0040102.g003 MER).Perhaps less surprisingly, because of the limited number of Bdellovibrio phages that have been isolated and sequenced, the phage w1422 has the largest number (64%) of ORFans (ORFs without known homologs).The atypically low numbers (3-8%) of ORFans in the genomes of phages ZF40 and Aaw23 are the consequence of the presence of closely related prophages in the databases.Prophages similar to ZF40 are found in the bacterial genomes of Yersinia frederiksenii ATCC33641 (NZ_AALE00000000.2) and Y. pseudotuberculosis IP31758 (NC_009708.1).Similarly, the prophage S1249 integrated within the A. actinomycetemcomitans strain D11S-1 genome[12] has a large segment of homology to the Aaw23 genome.There is also a prophage closely related to Aeromonas phage 56 within the Oxalobacter formigenes HOxBLS genome (NZ_ACDP00000000.1),but its homology is largely restricted to the right half of the genome, whereas the sequences of the prophages most closely related to ZF40 and Aaw23 are distributed across their entire genomes.

Figure 3
reveals that the overall genome organization of wPLPE-type of phages is relatively well conserved.

Figure 4 .
Figure 4.The two conserved structural mini-modules in the wPLPE group phages, excluding Aaw23.Color-coding is as in Fig.2and gene numbers refer to the wPLPE genome.ORF60 (''600) is of unknown function, but could be implicated in the baseplate (BP).The two small ORFs upstream of the terL genes in w1422 and ZF40 could be the terS genes.The terS/L genes in PY100 are not arranged as in the other phages and are far upstream and not side-by-side.All of the cellular hits shown are (conserved) hypothetical bacterial proteins, except for w1422 which has a s 54 transcription regulator (''s'').doi:10.1371/journal.pone.0040102.g004

Figure 5 .
Figure 5. Neighbor-joining trees of TerL (A) and portal proteins (B; wPLPE gp19 homologs).The eight dwarf wPLPE-like myoviruses are highlighted with red arrows.Branches are colored according to phage family type: red for Myoviruses, blue for Siphoviruses, green for Podoviruses and black for unknown morphology.Values at the nodes are the results of 100 bootstrap replicates.The scale bar indicates 0.1 substitutions per site.doi:10.1371/journal.pone.0040102.g005

Figure 6 .
Figure 6.Whole genome similarities among wPLPE group phages.(A) Reciprocal dot-plots of the wPLPE-like phages based on whole genome nucleotide sequences (left) or concatenations of all proteins (right).Two Mu-like and two P2-like phages have been included for comparison.(B) Similarity matrices of the DNA sequences (left) and concatenated polyproteins (right) of the phages in (A).The non-wPLPE-like phages and a randomized sequence of wPLPE serve as controls.Also included are the results of the statistical tests comparing the similarity values of the wPLPE-like phages to the controls.Similarity values are highlighted with increasingly darker shades of red.doi:10.1371/journal.pone.0040102.g006