Tales of diversity: Genomic and morphological characteristics of forty-six Arthrobacter phages

The vast bacteriophage population harbors an immense reservoir of genetic information. Almost 2000 phage genomes have been sequenced from phages infecting hosts in the phylum Actinobacteria, and analysis of these genomes reveals substantial diversity, pervasive mosaicism, and novel mechanisms for phage replication and lysogeny. Here, we describe the isolation and genomic characterization of 46 phages from environmental samples at various geographic locations in the U.S. infecting a single Arthrobacter sp. strain. These phages include representatives of all three virion morphologies, and Jasmine is the first sequenced podovirus of an actinobacterial host. The phages also span considerable sequence diversity, and can be grouped into 10 clusters according to their nucleotide diversity, and two singletons each with no close relatives. However, the clusters/singletons appear to be genomically well separated from each other, and relatively few genes are shared between clusters. Genome size varies from among the smallest of siphoviral phages (15,319 bp) to over 70 kbp, and G+C contents range from 45–68%, compared to 63.4% for the host genome. Although temperate phages are common among other actinobacterial hosts, these Arthrobacter phages are primarily lytic, and only the singleton Galaxy is likely temperate.

To further investigate the genetic diversity of phages infecting Actinobacterial hosts, we explored the use of Arthrobacter sp. for the isolation of phages from environmental samples.Arthrobacter spp.are primarily soil organisms, some of which break down complex hydrocarbons, including hexavalent chromium, 4-chlorophenol, and various aromatic compounds such as pyridine and its derivatives; as such, they may have potential for use in bioremediation [12][13][14].Arthrobacter spp.including A. arilaitensis are also components of smear-ripened cheese [15], and some Arthrobacter strains produce antibacterials such as penicillin derivatives [16].Arthrobacter cells lack mycolic acids, and stain as gram-variable related to a transition from coccus to rod morphology during cell growth [17].

Virion morphologies
Phage particles were observed by transmission electron microscopy with negative staining (Fig 1).Most have siphoviral morphologies with non-contractile, flexible tails, ranging in length from 111.2 (± 11.0) to 242.3 (± 13.3) nm, and isometric heads ranging in size from 55.8 (± 4.0) to 61.4 (± 2.4) nm.Two of the siphoviruses (Circum and Mudcat) have prolate heads with length of 73.7 (± 1.3) nm x width of 50.5 (± 2.2) nm ( Fig 1,S2 Table).Seven of the phages (Brent, Jawnski, Martha, Sonny, TaeYoung, BarretLemon, and PrincessTrina) have myoviral morphologies with a rigid tail and a tail sheath similar in appearance to P2-like [30] or Mulike [31] myoviral phages infecting E. coli and other Enterobacteria.Myoviral phages of other Actinobacterial hosts are less common than siphoviruses but include the Cluster C mycobacteriophages [32] and the singleton Rhodococcus phage E3 [33].Interestingly, Jasmine has a podoviral morphology with a head diameter of 59.8 (± 2.9) nm and a short stubby tail of 10.3 (± 0.9) nm ( Fig 1).Two phages of Arthrobacter have been previously described with similar morphologies [20] but their genomes have yet to be sequenced, and to our knowledge, these are the only podoviruses of Actinobacterial hosts among over 1,000 sequenced phages that been examined morphologically.

Arthrobacter phage genometrics
The Arthrobacter phage genomes were sequenced and putative gene locations and functions were assigned based on bioinformatic analyses as described previously [10,32,34].Genome lengths range considerably, from 15,319 bp (Toulouse) to 70,265 bp (PrincessTrina), with an average genome length of 45,832 bp (Table 1).The G+C contents span a broad range, from 45.1% (Mudcat) to 68.4% (Galaxy), such that the G+C content for many of the phages is substantially different from the Arthrobacter sp.ATCC 21022 host (63.4%) [25].The genome termini vary considerably: many have cohesive ends with 3' single stranded DNA extensions of 9-13 bases, some are circularly permuted and terminally redundant, and others have a direct terminal repeat ranging from 589 bp to 1584 bp long (Table 1).For two genomes, KellEzio and Kitkat, the ends could not be readily determined, but they are likely circularly permuted (manuscript in preparation).For these and the other circularly permuted terminally redundant genomes the sequences were linearized at positions near the 5' ends of the predicted terminase genes.

Arthrobacter phage genome organizations
General genomic features.The ten clusters and singletons Galaxy and Jasmine display a variety of genome organizations, reflecting variations on common architectural themes seen in other phages of the order Caudovirales.In general, the virion structure and assembly genes are organized with typical syntenic arrangement-terminase, portal, capsid maturation protease, scaffolding protein, major capsid protein, head-tail connectors, major tail subunit, tail chaperone proteins, tape measure protein, and minor tail proteins [32]-but are compactly organized in some genomes (e.g.Cluster AM) and are interrupted by non-structural genes in others (e.g.Cluster AL).In most of the genomes, the lytic functions are encoded immediately downstream of the virion genes, the exceptions being the Cluster AM and AU phages where the lytic gene is located upstream of the terminase, and in the Cluster AT phages, where it is between the terminase and capsid maturation protease genes; the remaining parts of the genomes include DNA metabolism genes and predicted regulatory functions.Galaxy is the only phage to encode an integrase, suggesting this it is temperate.Collectively, 62% of genes in these phages have unknown functions, and we note that the singletons Galaxy and Jasmine are replete with orphams, genes without homologues elsewhere in the Actinobacteriophages.We will briefly discuss the features of each cluster, and representative genomes maps are shown in Figs 5-15.
Cluster AK.The twelve Cluster AK phages (Table 1) are related to each other (    there is a small non-coding gap.The genome organizations are unusual in that, although the virion structure and assembly genes have the canonical order, there are numerous and For example, in Laroye there are five genes inserted between the terminase large subunit (10) and portal genes ( 16), eight genes are inserted between the protease (17) and major capsid (26) genes, and 37 genes are found between the major capsid subunit (26) and major tail subunit gene (64) (where there are typically 5-6 head-tail connector genes).Although genes coding for ssDNA binding protein (14), adolase (18), RNase (23), and another DNA binding domain (60) are found in the insertions, most of the inserted genes are of unknown function.With these insertions, the virion structure genes span over 35 kbp, and more than 50% of the 60 kbp genome.The remaining parts of the genomes contain several genes whose functions can be predicted but are atypical in phage genomes, including an RNA helicase (95), an AIG2-like protein (gamma-glutamylcyclotransferase; 97), an amidoligase (98), and a GTPase domain protein (99).
Clusters AM and AU.As noted above, the Cluster AM and AU genomes are distantly related, but share 25-30% of their genes, and the genome maps of Circum (AM) and Gordon (AU) are shown in  [37].An unusual feature is the apparent fusion of the major capsid subunit and capsid maturation protease functions into a single gene (e.g.Circum 12).This is reminiscent of previously described fusion proteins, such as the capsid and scaffold genes in E. coli HK97 [38] and the scaffold and protease fusions in phage Lambda [39].
Another unusual feature in the genomes of Cluster AM and AU phages is the presence of several small genes downstream of the tail genes, many of which encode putative membrane proteins.In Circum, 14 genes in the region of genes 25-61 encode proteins with between one and four membrane spanning domains, and 16 Gordon genes in the region of genes 31-52 All Arthrobacter phage predicted proteins were assorted into 1052 phams according to shared amino acid sequence similarities.Each genome was then assigned a value reflecting the presence or absence of a pham member, and the genomes were compared and displayed using Splitstree [36].Cluster and subcluster assignments derived from the dot plot and ANI analyses are annotated.The scale bar indicates 0.001 substitutions/site.https://doi.org/10.1371/journal.pone.0180517.g003(Figs 7 and 8) encode proteins with between one and five membrane spanning domains.The functions of these genes are unknown, but we note that similar arrays of putative membrane proteins are also present in Rhodococcus phages Pepy6 and Poco6 [6], and some of these share amino acid sequence similarity to Cluster AM and AU phages genes.
Cluster AN.The ten cluster AN phages are very closely related with small differences at their extreme left ends and some small regions of no sequence similarities (Fig 9

and S5 Fig).
They have unusually small genomes for dsDNA phages, and are among the smallest of the Siphoviridae (Table 1).With an average of 15.5 kbp they are slightly larger than the smallest siphovirus genome reported, Rhodococcus phage RRH1 (14,270 bp) [40].Much of the genome coding potential is occupied by the larger virion structure and assembly protein genes as shown in the map of Maggie (Fig 9), including a fused protease-capsid gene, similar to the gene fusions in Cluster AM and AU phages, but share little or no sequence similarity to Maggie gene 7. Interestingly, the small Rhodococcus phage RHH1 has a similarly fused gene, and the predicted protein is a distant relative (27% amino acid identity) of Maggie gp7 (Fig 9).The non-structural genes (20,21,22,23), include those coding for four putative DNA binding proteins one of which ( 21) is the only leftwards transcribed gene.There are no genes coding for DNA metabolism functions, and these phages illustrate how few genes are required for propagation as a dsDNA tailed virus.
Cluster AO.The six Cluster AO phages share substantial genome similarity (  Cluster AR.PrincessTrina and the previously described ArV1 [24] constitute Cluster AR and they share extensive nucleotide sequence similarity.Apart from five leftwards-transcribed genes (33)(34)(35)(36)(37), all of the genes are transcribed rightwards (Fig 13).The virion structure and  13) is located between the terminase large subunit and capsid maturation protease genes, a position unique to these Cluster AT phages.Third, there are two genes coding for products related to terminase large subunit genes (4,10).We also note the presence of two glycosyltranferase genes (28,90), one of which (28) is located between the capsid subunit and major tail subunit genes.Singletons Galaxy and Jasmine.Galaxy's genome is 37,809 bp with defined genome cohesive ends (Fig 15).Galaxy unusually has two genes (2, 54) predicted to code for terminase small subunits.We note that several of the structural genes (e.g. 5, 6, 7) have sequence similarity to some mycobacteriophages, a rare example of genes shared between mycobacteriophages and Arthrobacter phages.However, a high proportion of Galaxy genes are orphams (i.e.do not have relatives elsewhere in the Actinobacteriophage_692 database and shown as white boxes in Fig 15 ), a typical feature of singleton phages [10].
Galaxy is the only temperate phage among this group of Arthrobacter phages, and integrase (Int-Y) and repressor genes are predicted (33 and 34, respectively; Fig 15).Their organization is reminiscent of the mycobacteriophage integration-dependent immunity systems [41], but lack other common features such as recognizable degradation tags.Also, the attP site is not located within the repressor gene, and is positioned between genes 27 and 28 (coordinates 20,716-20,755) displaced by five genes from the integrase gene (33;Fig 15).The host genome contains two putative attB sites, located in identical tandemly repeated tRNA met genes (AUT_ 13455 and AUT26_13460).However, we have been unsuccessful in isolating stable Galaxy lysogens in Arthrobacter sp.ATCC21022, a similar scenario to that reported for Arthrobacter phage ArV2, which also has putative integrase and repressor genes, but for which stable lysogens could not be recovered [23].
The Jasmine genome is notable for the large number of orpham genes that lack relatives in the Actinobacteria database (Fig 16); only four of the 58 predicted genes have close relatives.It is the only sequenced Actinobacteriophage with a podoviral morphology (Fig 1), and the genome has 1,330 bp terminal direct repeats.Interestingly, the terminal repeat contains the complete coding region for an Lsr2-like gene, a distant relative to the Lsr2-like genes in several mycobacteriophages [42].Database comparisons suggest the virion structure and assembly genes are coded in the left part of the genome (genes 11-29), and include a putative tail spike gene (18; HHpred 99.81% probability score with the HK620 tail spike protein).

Lysis functions
Phage lysis functions are of interest as they provide insights into the host cell wall that must be compromised for cell lysis.Arthrobacter spp.lack mycolic acids in their cell walls, and thus the complete absence of lysin B genes encoding esterases cleaving the linkage of mycolic acids to the cell wall [43] is not surprising.However, endolysin genes can be identified in most of the phages, and in most cases a closely linked putative membrane protein likely acts as a holin.Notable exceptions are the Cluster AP phages (Tank, Wilde) for which we have not been able to identify an endolysin gene.We note that there are several small genes at the left ends of the genomes coding for putative membrane proteins that are holin candidates (3,5,6), and it is plausible that one of the genes between the leftmost direct terminal repeat and the terminase gene codes for an endolysin that is not discovered by database searches.The Arthrobacter phage endolysins are highly diverse and modular (Table 2), reflecting the complex structures reported for the mycobacteriophage endolysins [44].Some have three domains (peptidase, amidase, and cell wall binding domains; Clusters AL, AO, AR), whereas others have only subsets of these (Table 2).The phages in Cluster AN (e.g.Maggie) have the lysis functions coded in two separate genes (e.g.Maggie 16 and 17); gp16 has the predicted peptidase activity and the amidase and cell wall binding activities are in gp17.We note that Jasmine has two genes (22 and 30) predicted to code for amidase functions, but 22 is located with the tail genes, and thus is more likely to be associated with phage infection than lysis.

Evolutionary relationships
This collection of sequenced Arthrobacter phages provides insights into their spectrum of diversity relative to phages of other hosts, and how they are related specifically to phages of other Actinobacterial hosts.We note that the Arthrobacter phages distribute into a similar number of clusters and singletons (10, 2, respectively) identified when only 60 mycobacteriophage genomes had been sequenced, which formed 9 clusters and 5 singletons [32].This reflects a greater overall diversity than seen with phages of Propionibacterium acnes [45].To investigate this further we examined the distributions of gene phamilies (phams) representing groups of related proteins (see Material and Methods).The 3272 genes coding in the 48 genomes are grouped into a total of 1067 phams (S4 Table ), 273 of which (26%) are orphams with no close relatives in the database; these are especially prevalent in the singletons Galaxy and Jasmine (Figs [15][16][17].The proportions of "cluster-associated" phams-those present in all cluster members but not present in any other cluster-varies substantially among the clusters (Fig 17 ) indicating the degrees to which the overall diversity varies among the clusters; it does not correlate with the numbers of cluster members (Fig 17).
We also examined the extent to which the Arthrobacter phages are exchanging genes between clusters, or are relatively isolated.This is reflected in the numbers of phams in each cluster that are also present in at least one phage in another Arthrobacter cluster (Fig 17 , S5 Table ).For six clusters (AK, AL, AN, AP, AQ and AT) fewer than 10% of gene phamilies are in this category, reflecting relatively high levels of cluster isolation.Clusters AM and AU have more of these shared phams in part because they share about 25% of their genes with each other.Cluster AO and AR likewise share about 20% of their genes, and these relationships are also reflected in the shared branches in the Splitstree phylogeny shown in Fig 3 .We note that similar cluster isolation measures for the mycobacteriophages range from 16-77% with an average of 60.8% [10].
Interestingly, the number of phams present in phages of Actinobacterial hosts other than Arthrobacter (103 of 1052 phams, 9.7%; S6 Table) is similar to the numbers shared between Arthrobacter phage clusters (Fig 17).Thus, the clusters are not only genetically well isolated from each other, but the genes that are shared are just as likely to be shared by non-Arthrobacter phages as they are by other Arthrobacter phages.We note, however, that there is considerable variation among the clusters in the patterns of shared genes.For example, the Cluster AU phages share more genes with other Arthrobacter phages than non-Arthrobacter phages, whereas in Cluster AN, AP, AQ and the singleton Galaxy, the opposite pattern is observed (Fig 17).Moreover, the genes are not shared with the phages of any one different host, but are  ).Over half of the shared genes (53/101) are in Actinobacteriophages other than those infecting Mycobacterium, even though those are only 10% of the non-Arthrobacter Actinobacteriophages.The most striking relationship is that between Clusters AM and AU with Rhodococcus phages Poco6 and Pepy6 (S10 Fig), with more than 20 shared genes distributed across the entire genome spans, most with more than 50% amino acid identity (S6 Table ); there is also weak but evident nucleotide sequence similarity (S10 Fig) .Interestingly, these relationships do not obviously mirror the phylogeny of the actinobacterial hosts.Arthrobacter is more closely related to Streptomyces that it is to Mycobacteria, Gordonia, or Rhodococcus (Fig 18), but only nine Arthrobacter phage phams are shared with Streptomyces phages (of which there are 32 in the database used).In contrast, 36 Arthrobacter phage phams are shared with Rhodococcus phages (of which there are 16 in the database used).Although the numbers of phages available for these types of analyses are still small, there is little evidence of a correlation between shared gene content of representative phages from each actinobacteriophage cluster and phylogenetic proximity of their hosts ( Fig 18,S7 Table).We also tested 21 Arthrobacter phages for their abilities to infect 29 different Actinobacterial hosts, including nine other Arthrobacter species (see Materials and Methods).None of the Arthrobacter phages tested infected any of these strains, and no mutants with expanded host range were identified.These narrow host preferences reflect those reported previously for ArV2 [24] and ArV1 [23].

Concluding remarks
Here we have described 46 newly isolated phages of Arthrobacter sp.ATCC21022 and compared their genomic sequences.They are richly diverse in morphotype and genotype, with 12 distinct lineages forming ten clusters and two singletons.These clearly represent an undersampling of the broader population-at-large of phages infecting this strain, and the diversity of the large collection of mycobacteriophages suggests that the sequenced Arthrobacter phage collection will need to be expanded 10-20-fold to reflect better their genomic diversity.Given the narrow host range of these phages, we also predict that phages isolated on other Arthrobacter strains will reveal phage genomic lineages not previously described.The dearth of temperate phages among those described here is somewhat surprising, as they represent the majority of phages isolated on M. smegmatis [10] and on Gordonia terrae (unpublished observations).Because all of these phages were isolated from similar environments, the relative preponderance of temperate and lytic phages appears to be a function of the host used for isolation, rather than different environmental parameters, although we note that metagenomic studies suggest that temperate phages are more prevalent in environments with higher bacterial densities [48].The roles of the hosts in directing evolution of phage lifestyles remains obscure, but isolation and genomic characterization of large sets of phages on hosts within the Actinobacteria will hopefully illuminate this question.

Bacterial strains and media
All phages were isolated on Arthrobacter species, ATCC strain 21022.Either LB media (L-agar base) or PYCa media (containing per 1 liter volume: 1.0 g Yeast extract, 15 g Peptone, 2.5 mL 40% Dextrose, and 4.5 ml 1M CaCl 2 ) were used for phage isolation and amplification.Arthrobacter phage isolation, propagation, and virion analysis All phages were obtained from soil samples with permissions granted (S1 Table ).For the soil enrichment protocol, 1-2 grams of soil were incubated at 30˚C with Arthrobacter sp. in PYCa or LB medium supplemented with 1-4.5 mM CaCl 2 a and Arthrobacter sp.host for 2-5 days.These enriched soil samples were filtered with 0.22 μm-0.45 μm filters and the filtrates were introduced to a pure culture of Arthrobacter sp.Some soil samples were not enriched with host bacteria prior to performing a plaque assay.For these samples, the soil samples were treated with phage buffer (10mM Tris-HCL, pH 7.5; 10mM MgSO 4 ; 68.5mM NaCl; 1mM CaCl 2 ), shaken vigorously, filtered, and plated directly on solid overlays containing 0.35% agar and Arthrobacter host and incubated at 30˚C for 16-48 hours.For both the enriched soil samples and the direct soil samples, individual plaques were purified.Once plaque purified, high-titer Arthrobacter phage stocks and plate lysates were obtained using methods described previously for Mycobacterial hosts [26].Phage particles were spotted onto formvar and carbon-coated 400 mesh copper grids, rinsed with distilled water and stained with 1% uranyl acetate.Images were taken using a FEI Morgagni transmission electron microscope.Measurements were performed on at least 3 particles for each phage.

Genome sequencing, annotation, and analysis
Arthrobacter phages were isolated, sequenced, and annotated in the PHIRE or SEA-PHAGES programs.Phage genomes were shotgun-sequenced using either 454, Ion Torrent, or Illumina platforms to at least 20-fold coverage.Shotgun reads were assembled de novo with Newbler versions 2.1 to 2.9.Assemblies were checked for low coverage or discrepant areas, and targeted Sanger Figs 2-4), with the virion genes in the left part of the genome and non-structural genes in the right part (Fig 5, S2 Fig).All genes are transcribed rightwards, with the exception of five leftwards- transcribed genes near the right end, one of which is a putative DNA binding protein (Fig 5).The portal and a Mu F-like protein are fused as a single gene (6) as shown in Fig 5. Cluster AL.The two Cluster AL genomes are closely related and differ by 7-8 small insertions or replacements in the right portion of the genomes (Fig 6, S3 Fig).The genomes have been bioinformatically linearized 6.7 kbp upstream of the terminase large subunit gene, where

Fig 2 .
Fig 2. Nucleotide sequence comparison of Arthrobacter phages.Dot Plot of Arthrobacter phage genomes displayed using Gepard [35].Individual genome sequences were concatenated into a single file arranged such that related genomes were adjacent to each other.The assignment of clusters is shown along both the left and bottom.https://doi.org/10.1371/journal.pone.0180517.g002 Figs 7 and 8 and S4 Fig.The endolysin genes (Circum 7, Gordon 4) are located upstream of the terminase large subunit genes (Figs 7 and 8 and S3 Fig), as seen in Cluster A mycobacteriophages

Fig 3 .
Fig 3. Splitstree representation of Arthrobacter phages and average nucleotide comparisons ofCluster AO Arthrobacter phages.All Arthrobacter phage predicted proteins were assorted into 1052 phams according to shared amino acid sequence similarities.Each genome was then assigned a value reflecting the presence or absence of a pham member, and the genomes were compared and displayed using Splitstree[36].Cluster and subcluster assignments derived from the dot plot and ANI analyses are annotated.The scale bar indicates 0.001 substitutions/site.

Fig 4 .
Fig 4. Pairwise alignment of clustered Arthrobacter phages.The genomes of 23 Arthrobacter phages are shown.Pairwise nucleotide sequence similarity is displayed by color-spectrum coloring between the genomes, with violet as most similar and red as least similar.Genes are shown as boxes above (transcribed rightwards) and below (transcribed leftwards) each genome line; boxes are colored according to the gene phamilies they are assigned [29].Maps were generated using Phamerator and its database Actinobacteriophage_692. https://doi.org/10.1371/journal.pone.0180517.g004 Figs 2-4, S6 Fig) and a map of the Jawnski genome is shown in Fig 10.The virion structure and assembly genes are canonically ordered, but include a tail sheath and baseplate-like protein genes consistent with the contractile tail virion morphology (Fig 1); the lysis cassette appears to be inserted within the end of the tail gene operon (Fig 10).Jawnski codes for a RecET recombination system (genes 32 and 33) and a beta subunit of DNA Pol III (69), but most of the non-structural genes are of unknown function.Cluster AP.The two Cluster AP genomes, Tank and Wilde, are closely related with 5-6 small insertions and deletions relative to each other (Fig 11, S7 Fig).The genomes have direct terminal repeats and the virion structure and assembly genes are canonically ordered but

Fig 5 .
Fig 5. Genome organization of Arthrobacter phage Korra, Cluster AK.The genome of Arthrobacter phage Korra is shown with predicted genes depicted as boxes either above (rightwards-expressed) or below (leftwards-expressed) the genome.Genes are colored according to the phamily designations using Phamerator and database Actinobacteriophage_692, with the phamily number shown above each gene with the number of phamily members in parentheses.https://doi.org/10.1371/journal.pone.0180517.g005

Fig 17 .Fig 18 .
Fig 17.Cluster diversity and inter-cluster relationships.Intra-cluster diversity was determined by the percent of cluster-identifier phams (phams present in all members of a cluster and not found in phages of other clusters, red bars, not calculated for singleton phages), and the percent of orphams (phams present in only one phage, with no homologues in the database, blue bars).Inter-cluster relationships are shown as the proportion of phams present in each Arthrobacter phage cluster that are also present in at least one phage of another Arthrobacter cluster (yellow bars) or in at least one phage infecting a host other than Arthrobacter (green bars).The number of phages in each cluster is indicated in parentheses below the cluster name.https://doi.org/10.1371/journal.pone.0180517.g017