Genomic and Proteomic Analyses of the Terminally Redundant Genome of the Pseudomonas aeruginosa Phage PaP1: Establishment of Genus PaP1-Like Phages

We isolated and characterized a new Pseudomonas aeruginosa myovirus named PaP1. The morphology of this phage was visualized by electron microscopy and its genome sequence and ends were determined. Finally, genomic and proteomic analyses were performed. PaP1 has an icosahedral head with an apex diameter of 68–70 nm and a contractile tail with a length of 138–140 nm. The PaP1 genome is a linear dsDNA molecule containing 91,715 base pairs (bp) with a G+C content of 49.36% and 12 tRNA genes. A strategy to identify the genome ends of PaP1 was designed. The genome has a 1190 bp terminal redundancy. PaP1 has 157 open reading frames (ORFs). Of these, 143 proteins are homologs of known proteins, but only 38 could be functionally identified. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and high-performance liquid chromatography-mass spectrometry allowed identification of 12 ORFs as structural protein coding genes within the PaP1 genome. Comparative genomic analysis indicated that the Pseudomonas aeruginosa phage PaP1, JG004, PAK_P1 and vB_PaeM_C2-10_Ab1 share great similarity. Besides their similar biological characteristics, the phages contain 123 core genes and have very close phylogenetic relationships, which distinguish them from other known phage genera. We therefore propose that these four phages be classified as PaP1-like phages, a new phage genus of Myoviridae that infects Pseudomonas aeruginosa.


Introduction
Bacteriophages (phages) are ubiquitous in the biosphere [1]. Estimations of phage numbers, ranging from 10 30 to 10 32 in total, are approximately tenfold higher than those of bacteria [2]. Numerous phage investigations have been performed worldwide since Frederick William Twort and Felix dHerelle first reported the discovery of phages in 1915 and 1917, respectively [3,4]. Phages are potential antimicrobial agents in various clinical or agricultural settings [5,6] and have become important molecular and biological tools in facilitating the development of bioscience. Approximately 6300 different phages have been examined by electron microscopy [7]; however, only 759 of these (721 infecting bacteria and 38 infecting archaea) have been completely sequenced based on data from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/; Bethesda, MA, USA, Oct. 2012). This number is far lower than the number of sequenced bacteria (3433 complete genomes of bacteria and 199 complete genomes of archaea as of 28 Oct. 2012). A detailed dissection of phage genomes would add valuable data to our knowledge of phages and help us understand the evolutionary relationships between phages and bacteria.
Pseudomonas aeruginosa (P. aeruginosa), an opportunistic pathogen, is ubiquitous in the environment and often resistant to a large number of antibiotics. As such, the treatment of P. aeruginosa infections is very difficult [8]. Investigating the biology of phages is important for humans to fight multiple-drug resistant pathogens [9]. Sixty-three complete genome sequences of Pseudomonas phages, most of which infect P. aeruginosa, have become available in GenBank as of 28 Oct. 2012. Among these genome sequences, 18 P. aeruginosa phages belong to the Myoviridae family. Members of this family are efficient killers of bacteria and can affect many aspects of bacterial ecology and evolution. P. aeruginosa phages have been studied for decades for use as therapeutics and typing agents [10]. This group of phages seems to be taxonomically diverse and genetically dissimilar [11]. Currently, most characterized myoviruses of P. aeruginosa are classified into four genera, namely, phiKZ-like phages, P2-like phages, PB1-like phages, and KPP10-like phages [12][13][14][15]. Some P. aeruginosa phages have been characterized but remain unclassified. Detailed characterizations of novel P. aeruginosa phages will be significant for understanding the interactions between P. aeruginosa and its phages and the exploration of useful therapeutic reagents against P. aeruginosa.
The interests of our group are focused on the dissection of phage genomes and their biological issues. We previously isolated and identified three P. aeruginosa phages from the sewages of our affiliated hospitals and designated them as PaP1, PaP2, and PaP3. PaP1 is a virulent phage, whereas both PaP2 and PaP3 are temperate phages. The genome of PaP3 can integrate into the chromosome of the host bacteria through a tRNA gene locus [16].
The present work focuses on the genomic and proteomic analyses of the phage PaP1. The results indicate that PaP1, in taxonomy, belongs to Myoviridae, and its genome has terminally redundant ends of 1190 bp. High-performance liquid chromatography-mass spectrometry (HPLC-MS) identified 12 PaP1 structural protein coding genes. Based on comparative genomic analysis, we propose that P. aeruginosa phages PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 be classified as a new genus named ''PaP1-like phages'' within the Myoviridae family.

Morphology of PaP1
PaP1 is in structure and dimensions identical to P. aeruginosa phage PB1 [14]. The head is an icosahedron, as evidenced by the simultaneous presence of hexagonal and pentagonal capsids, and measures 68-70 nm between opposite apices ( Figure 1). Shallow depressions in uranyl acetate indicate the presence of capsomers. The head is separated from the tail sheath by an 8 nm-long neck. Uncontracted tails measure 138-140 6 17-20 nm. Contracted tails measure 55 6 22 nm. Base plates are poorly visible on extended tails. Upon contraction, they separate from the sheath and appear as disks of 23 63 nm. There are at least 4 straight tail fibers ( Figure 1B). In the quiescent tail, the fibers are folded along the sheath.

Biological characteristics of PaP1
PaP1 forms clear plaques (,3 mm in diameter) surrounded by a small semitransparent halo on the lawns of the host bacteria. In rich liquid medium, PaP1 is amplified to high titers (,10 11 PFU/ ml). PaP1 particles are stable for over two months of storage at 4uC and resistant to chloroform. Based on the one-step growth curve of PaP1 (Figure 2), its latent period is about 20 min, its burst period is about 40 min, and the average number of PaP1 progeny produced from one host bacterium is about 65. PaP1 can lyse two other strains of P. aeruginosa (PA4 and PA6), aside from PA1. This result may provide a potential basis for the application of PaP1 in phage therapy.

General genomic characteristics of PaP1
The genome of PaP1 consists of 91,715 base pairs (bp) with a G+C content of 49.36%, which is significantly less than that of its host (66.34%). The GC skew of the PaP1 genome is shown in Figure 3. In viral genomes, the lowest point on the GC skew curve is typically the origin of replication [17]. Therefore, the putative replication origin of the PaP1 genome is at the end of the genome sequence ( Figure 3).
No direct repeats of more than 50 bp, inverted repeats of more than 26 bp, or mirror repeats of more than 17 bp were found in the PaP1 genome, indicating that it does not contain complicated secondary structures. Twelve tRNA genes were found in the PaP1 genome (Table 1). Among the 12 encoded tRNAs, tRNA Arg , tRNA Gln , and tRNA Gly are used preferentially by PaP1, but not by the host. Codon usage analysis indicated that the three tRNAs are important for the protein expression of PaP1, since phage tRNA genes can overcome differences in codon usage between the phage and the host [18].

Determination of PaP1 genome termini
The PaP1 genome was assembled as a circular molecule when sequencing was completed. The restriction endonucleases NarI and NotI, both of which have only one cut site in the genomic DNA, were selected to digest the DNA and released two short bands (about 2.5 and 6.5 kb) in the gel ( Figure 4A) This result indicates that the PaP1 genome is a linear molecule. Figure 4B shows the recognition sites of NarI and NotI within the linear PaP1 genome. The DNA band indicated by the red arrow in Figure 4A is the 39 end fragment of the PaP1 genome. The restriction endonuclease FspI was used to digest the PaP1 genome and release the 59 end fragment, as indicated by the red arrow in Figure 4C. Both the 39 and 59 end fragments were purified and used as templates for terminal run-off sequencing with primers P1 and P2 ( Figure 4D), respectively.
The results of terminal run-off sequencing coincided with case 3 in Figure 4D. The two sequences obtained by P1 and P2 have a long repeat of 1190 bp, suggesting that the repeated sequence is terminally redundant rather than cohesive, since the latter is usually less than 100 bp [16,19,20]. We used S1 nuclease, a singlestrand digesting enzyme, to digest the 59 end fragment. The cut and uncut 59 end fragments had the same size ( Figure 4C), indicating that the 1190 bp terminal sequence is double-stranded. Thus, the natural structure of the PaP1 genome DNA molecule can be described as shown in Figure 5A. Two functionally unknown genes (g156 and g157) were found in the terminally redundant region ( Figure 5B).

Identification and organization of PaP1 genes
The open reading frames (ORFs) of the PaP1 genome were identified using ORF Finder [21] with ATG, GTG, and TTG as  (Table 1), which is in concordance with the fact that the ORFs of tailed dsDNA phages are tightly and efficiently organized, with little space between genes. The space between genes is usually occupied by putative regulatory sequences, such as promoters and terminators. Putative promoters and terminators of the PaP1 genome are listed in Table 2 Table S1 and illustrated in Figure 6.
The data shows that the PaP1 genome may be divided into several functional modules, revealing an apparent mosaic structure, which is one of the striking characteristics of phage genomes. This finding also suggests that tailed phage genomes evolve from combinations of modules from different species [22].
Two functionally unknown modules reside near the 59 and 39 ends of the PaP1 genome, respectively. A large number of small   Figure 5C) is then performed. Case 1 (blunt end): The two sequences obtained by P1 and P2 do not have repeated regions and they can be assembled to the PaP1 genome sequence with no gap between them. Case 2 (39-protruded end): The two sequences obtained by P1 and P2 also do not have repeated regions; however, a gap is observed between the sequences once assembled to the PaP1 genome sequence. The sequence within the gap is the 39-protruded cohesive sequence. Case 3 (59-protruded end or terminal redundancy): A repeat between the two obtained sequences is observed. If the repeated sequence is less than 100 bp, it is regarded as the 59-protruded cohesive sequence [16,19,20]; however, if the repeated sequence is over 100 bp, it is regarded as a terminally redundant sequence [12,62,63] genes with unknown functions cluster in the two modules. The 39 end module consisting of g114 to g157 may play critical roles in early transcriptional events. PaP1 codes for its own RNA and DNA polymerases, which are probably involved in the synthesis of its RNA and DNA molecules. At least 10 genes cluster on the minus strand ( Figure 6), which probably controls the nucleotide metabolism system of PaP1. These genes can convert the metabolism of the host cell to produce progeny phages [23]. The products of g032 and g110 are dCMP deaminase and thymidylate synthase, respectively, both of which are involved in dTTP synthesis [24]. Phage-encoded thymidylate synthase appears to have diverged from the precursor of the host bacteria.
No sequence homologs to phage integrases, repressors, transposases, or excisionases were found, supporting our conclusion that PaP1 is a lytic phage. Genes closely related to phage morphogenesis also cluster together, among which 12 genes encoding structural proteins were identified (as below). The products of g029 and g072 are cell wall hydrolase and endolysin, respectively, both of which belong to enzybiotic factors [25,26]. Although endolysin and holin usually  constitute a two-component lysis system for the liberation of phage progenies from the host cell, no holin homolog was found in the PaP1 genome. The endolysin gene (g072) of PaP1 has been expressed in our laboratory, and we have shown that the purified product can hydrolyze the cell wall peptidoglycan of P. aeruginosa [27].

Identification of phage PaP1 structural proteins
A dsDNA phage particle is made up of a series of structural proteins and a single DNA molecule containing the entire genome. The PaP1 structural protein coding genes cluster in a module of its genome and are preceded by a terminase gene ( Figure 6). The terminase plays an important role in DNA packaging. To identify the structural proteins of PaP1, sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was used to visualize each structural protein in the gel (Figure 7). At least 17 proteins with molecular weights ranging from 6 kDa to 80 kDa were resolved. Each protein band was then excised for HPLC-MS, permitting the allocation of 15 protein bands to 12 corresponding PaP1 genes ( Figure 7). The detailed parameters and results of the mass spectrometry are shown in Table 4. The sequence coverage reaches up to 58%. The sequence coverages of gp067 and gp071 are 3% and 4%, respectively, which are relatively low compared with other proteins; hence, the identification of these proteins as structural components of the phage must be confirmed further.
The predominant band is, as predicted, the major capsid protein (gp051, ,40 kDa); the band (,43 kDa) just above it was also identified as a major capsid protein by mass spectrometry (Figure 7). Peptides corresponding to gp050 (,15 kDa) were found in three bands (Figure 7), and similar to gp051, a small band (,16 kDa) just above it was also identified as gp050. This observation may be explained as the result of the known carryover effect [28,29] of the massively overrepresented major capsid protein and gp051 bands. The gp050 band at the bottom of the gel (,6 kDa) suggests posttranslational processing [30] of gp050. An unusual finding was that gp104 (Figure 7), a p09 homolog of the phage PaP3 (Table S1), is not located in the late gene cluster for phage morphogenesis but among the genes probably involved in DNA replication and control ( Figure 6). The four identified PaP1 structural proteins, tape measure, tail fiber, baseplate, and major capsid, with molecular weight ranging from 40 kDa to 80 kDa, are important for phage PaP1 particle formation. Interestingly, the major capsid protein of PaP1 shares molecular weights and amino acid sequences identical to those of the P. aeruginosa phages JG004 [31], PAK_P1 [32] and vB_PaeM_C2-10_Ab1 (Table 5), indicating a close relationship among these four phages.
A graphical comparison (Figure 8) was performed to illustrate the genomic similarities of PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1. This figure is a visualized description of the corresponding data listed in Table 5. The whole genome sequences of phage PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 show great similarities and most of their DNA sequences appear to have descended from a single common ancestral phage. We also performed a dot plot comparison of the genome sequences of PaP1, JG004, PAK_P1, vB_PaeM_C2-10_Ab1, PAK_P3, and KPP10 ( Figure 9). The results are in concordance with Table 5 and Figure 8.
At the protein level, PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 show striking similarities. Intriguingly, the major capsid proteins of these four phages share 100% sequence identity with each other, which explains their similar morphologies [31,32, and the present work]. As shown in Figure 8, the regions with no blue shading represent minor insertions or deletions among the genome sequences of these four phages. These DNA regions are probably the accumulated mutations for phage adaptation to the host bacteria. The main differences between these four phage genomes are located in their tail fiber encoding genes, indicating that these  phages may have evolved different host cell adsorption mechanisms.
Protein homology analysis. The genomic comparison indicated that phages PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 are closely related. To investigate this further, protein homology analysis was performed. The result reveals that the PaP1 genome shares 123 (78.34%) homologs with JG004, PAK_P1, and vB_PaeM_C2-10_Ab1, and shares 55 (35.03%) homologs with KPP10 and PAK_P3, and shares even fewer homologs with other phage groups. These results strongly indicate that PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 are closely associated and distinguishable from other phage groups. Therefore, these four phages do not belong to any known phage genus and may have descended from a common ancestor.
Phylogenetic analysis. Phylogenetic analysis was performed based on the major capsid proteins. Since related phages are considered to have similar head structural components, they may be clustered based on their major capsid proteins [33]. We chose phages that are listed in Tables 5 and 6 to analyze the phylogenetic relationships between them, and a phylogenetic tree was constructed ( Figure 10). Phage PaP2 was included in the tree because it was identified from the same sample (hospital sewage) from which PaP1 and PaP3 were obtained. Figure 10 shows that different P. aeruginosa phage genera cluster in the phylogenetic tree based on the major capsid proteins. This observation is in accordance with the data listed in Table 6, in which myoviruses of P. aeruginosa have been assigned to several phage genera, except for PaP1, JG004, PAK_P1, and  Figure 8. Pairwise nucleotide sequence comparison of phages closely related to PaP1. Comparisons were conducted using BLAST 2.25 and displayed using ACT [56]. Highly related sequences are shown by the blue shadings. The intensity of the blue coloration indicates the level of sequence similarity. The minimum score cutoff is 100 and the minimum identity cutoff is 50%. # The full name is vB_PaeM_C2-10_Ab1. doi:10.1371/journal.pone.0062933.g008 vB_PaeM_C2-10_Ab1. As expected, these four phages are closely clustered, distinguishing them from other P. aeruginosa phages. This finding reinforces the idea that these four phages descend from a common ancestor. Thus, PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 should be grouped as a new phage genus: ''PaP1-like phages'' ( Figure 10).

Discussion
In the present work, the newly isolated phage PaP1 was assigned as a member of the Myoviridae family. Over 96% of the investigated phages belong to the tailed phages. A total of 6054 tailed phages are known, among which 1558 are from the family Myoviridae [7]. Although members of Myoviridae have many common features, they actually represent a diverse collection of phages.
Phage PaP1 has a linear genome consisting of 91,715 bp with a terminal redundancy of 1190 bp. We designed a new strategy to determine the genome ends of PaP1 ( Figure 4D) and this strategy is useful for the identification of genome ends of many other phages. Many known phages have terminally redundant genomes, such as phages T3, T7, P22, SPP1, and T4. Similar to PaP1, several Myoviridae phages of P. aeruginosa (e.g., phiKZ, EL, JG004, and KPP10) also have terminal redundancies (Table 6). These phages employ a variety of mechanisms to generate long DNA concatemers with terminal redundancy, which ensures phage replication without any loss of genetic information [34]. Intriguingly, the terminally redundant region of phage SPO1 contains a ''host take-over module'' composed of a cluster of 24 genes. This region is responsible for shutting off transcription and translation of the host genes [35]. The terminally redundant region of the  PaP1 genome contains two genes with unknown functions ( Figure 5B). Identification of the two genes, the PaP1 terminase, and the terminase recognition (pac) site [36], may provide a basis for a understanding of phage PaP1 morphogenesis.
The 91,715 bp PaP1 genome encodes 157 putative proteins. Based on the predicted functions of these proteins, the genome can be divided into several functional modules, showing an apparent mosaic structure that is characteristic of the phage genomes [22,37]. The PaP1 genome also contains 12,660 bp worth of noncoding regions. Non-coding cannot be interpreted as an indicator of no biological function because some ncRNAs (non-coding RNAs) or DNA binding motifs may be observed within the noncoding regions in phage genomes, which may be useful for the phages but toxic to their hosts [38]. Less than a quarter of the 157 putative proteins have homologs with known functions. Twelve ORFs of the PaP1 genome were identified as structural protein coding genes (Figure 7). The majority of the phage ORFs have unknown functions, which hinders phage studies. The sequence data of these 12 structural proteins have been added to the phage proteomic pool. As similar data emerge, the collective information will be valuable for future phage studies.
Comparative genome analysis revealed that the PaP1 genome shows great similarity with JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 both at the DNA and protein levels, distinguishing them from other Myoviridae. In essence, the phages have very few relationships with other phage genera. Some similarities consistently exist among the tailed phages, suggesting that phages may undergo genetic material exchange from a large shared pool [22]. Substantial evidence suggests that tailed phages may be of very ancient origin and it has been proposed that all of the dsDNA tailed phages share common ancestry [39]. The comparison of coat protein structure and virion architecture can provide a sound basis for grouping viruses together [33]. The major capsid protein of PaP1 shares 100% identity with those of JG004, PAK_P1, and vB_PaeM_C2-10_Ab1; their particles also share identical morphology. This observation supports the idea that PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1 can be grouped together, as shown in Figure 10.
According to similarities in biological characteristics and both DNA and protein sequences, prominent core genes, and the close phylogenetic relationships among PaP1, JG004, PAK_P1, and vB_PaeM_C2-10_Ab1, we propose that these four phages, with PaP1 as the type virus, can be grouped as a new genus (PaP1-like phages) of myovirus bacteriophages, as shown in Figure 10. We predict that other newly characterized phages will be assigned to ''PaP1-like genus'' in the near future, thereby contributing to our understanding of phage biology.

Pseudomonas aeruginosa strains
Six P. aeruginosa strains (PA1, PA2, PA3, PA4, PA5, and PA6) were isolated at the second affiliated hospital of the Third Military Medical University, Chongqing, China, and cultivated in our laboratory. These strains belong to serogroups 9, 20, 6, 6, 20, and 11 of the P. aeruginosa international antigenic typing system, respectively. All six strains were cultivated at 37uC in LB medium with shaking for ,5 h to reach the log phase.

Phage propagation and purification
Phage PaP1 was isolated from the hospital sewage using P. aeruginosa PA1 as host bacterium, based on a standard lambda phage isolation protocol [40]. A liquid culture of the PA1 strain of the log phase growth was infected with PaP1 (MOI of 1/100) and incubated at 37uC with shaking. After ,5 h, the culture showed signs of lysis, and a few drops of chloroform were added to it. The culture was then centrifuged at 10,000 g for 5 min, and the supernatant was stored at 4uC for subsequent experiments. After storage at 4uC for over two months, the supernatant was diluted, plated onto a Petri dish overlaid with the PA1 stain, and then cultured at 37uC until individual plaques could be picked to test the titers of PaP1 in the supernatant. Another 5 P. aeruginosa strains (PA2, PA3, PA4, PA5, and PA6) were used as host bacteria to test whether or not PaP1 could lyse them. One-step growth experiments of PaP1 were performed, as previously described [41], to determine phage growth characteristics. Crude phage suspensions of PaP1 were concentrated and purified by PEG8000 precipitation according to the method of Govind et al. [42]. The purified PaP1 particles were further purified using CsCl gradient ultracentrifugation [43].

Transmission electron microscopy (TEM)
Filtered phage lysates (about 10 11 PFU/mL) were sedimented for 60 min at 25,000 g in a Beckman J2-21 centrifuge (Palo Alto, CA, USA) equipped with a JA1.1 fixed-angle rotor, followed by washing in neutral ammonium acetate buffer (0.1 M) under the same conditions. Phage particles were deposited on carbon-coated copper grids, stained with uranyl acetate (2%, pH 4.5) or potassium phosphotungstate (2%, pH 7.0), and examined under a Philips EM 300 electron microscope. Magnification was monitored with T4 phage tails. Dimensions of PaP1 particles are calculated from 20 particles.

DNA extraction and sequencing
EDTA to a final concentration of 20 mM, proteinase K at 50 mg mL 21 , and sodium dodecyl sulfate at 0.5% (w/v) were added to the purified phage PaP1 stock solution. The mixture was incubated at 56uC for 1 h, after which an equal volume of phenolchloroform-isoamyl alcohol (25:24:1) was added to it, followed by centrifugation at 5000 g for 10 min. The aqueous layer was extracted with chloroform at 5000 g for 10 min. The aqueous layer was collected, mixed with 0.6 volumes of isopropanol, and then stored at 220uC for one night. The mixture was centrifuged at 4uC and 12,000 g for 10 min, and the precipitated DNA was collected and washed with 70% and 100% ethanol, respectively. The obtained PaP1 DNA was suspended in TE buffer (pH 8.0) and stored at 220uC for use. DNA sequencing was carried out at the Chinese National Human Genome Center (Shanghai, China) using the Roche/454 GS FLX Titanium system [44]. Roche/454 sequence reads were assembled using the Phred/Phrap/Consed software package [45].

Analysis of PaP1 genome ends
Simulation of the restriction enzyme mapping of the PaP1 genome sequence was performed using the software package DNAStar [46]. The PaP1 DNA was digested by selected restriction endonucleases (NarI, NotI, and FspI, purchased from New England Biolabs, Ipswich, MA, USA). For a reaction system of 20 mL, 10 units of the restriction endonuclease (NarI or NotI) and 200 ng of PaP1 DNA were used. The mixture was incubated at 37uC for 120 min and then used to perform agarose gel electrophoresis. For a reaction system of 100 mL, 1 mg of PaP1 genome DNA and 50 units of restriction endonuclease (NotI or FspI) were used. The mixture was incubated at 37uC for 100 min. Agarose gel electrophoresis was subsequently performed to separate the restriction fragments containing the 59 and 39 ends of the PaP1 genome. The restriction fragments containing the 59 and 39 ends of the PaP1 genome were purified using Wizard SV Gel and PCR Clean-up System (Promega, Fitchburg, WI, USA), respectively. Terminal run-off sequencing was carried out by BGI-Shenzhen (Shenzhen, China). The 39 end fragment was sequenced using primer P1 (59-CGTTCGACGATCCGATGC-39), and terminal run-off sequencing of the 39 end fragment was performed by P1. The 59 end fragment was sequenced using primer P2, which represents three primers (P2a, P2b, and P2c). P2a  was the first primer used to sequence the 59 end fragment. P2b (59-TCGCCTTCTGCCAGTTATG-39) was designed based on the DNA sequence acquired by P2a. P2c (59-ATGCCTTGTCG-CAGTTGG-39) was designed based on the DNA sequence acquired by P2b, and terminal run-off sequencing of the 59 end fragment was performed by P2c. These primers were prepared by BGI-Shenzhen (Shenzhen, China). We used a strategy to explore the terminal sequence of the PaP1 DNA ( Figure 4D). Digestion of the 59 end fragment with S1 nuclease (Takara Bio, Shiga, Japan) at 23uC for 20 min was carried out to further identify the terminally redundant genome of phage PaP1.

Sequence analysis and genome annotation
The software packages DNAStar [46] and DNAMAN (http:// www.lynnon.com/) were used to analyze the basic features of the PaP1 genome sequence. The GC skew of the PaP1 genome was analyzed using DNAPlotter [47]. The internet tool tRNAscan-SE 1.21 [48] was used to predict tRNA genes in the DNA sequence with a cove score cutoff of 20. ORFs were analyzed using NCBI ORF Finder [21], and phage genes were predicted using the software GeneMark.HMM [49] with a length threshold of 100 bp. DNA sequences and protein sequences were scanned for homologs using BLAST [50]. Predicted promoter regions were identified using neural network promoter prediction [51], and putative terminator structures were identified using the web tool FindTerm (http://linux1.softberry.com/berry.phtml).

SDS-PAGE and HPLC-MS of the PaP1 structural proteins
The purified phage particles were resuspended in SDS-PAGE loading buffer [52] and boiled for 5 min before loading onto a 15% (w/v) polyacrylamide gel to identify structural proteins of phage PaP1. We also performed 12% (w/v) and 10% (w/v) SDS-PAGE to better separate proteins of different molecular weight ranges. Protein bands were visualized by staining with Coomassie Brilliant Blue R250 dye for 1 h with shaking and washing with methanol-acetic acid-H 2 O (5:1:4). Slices were excised from the gel and digested as described previously [53]. HPLC-MS was performed using an HPLC-CHIP-MS/MS ION TRAP 6330 system (Agilent, Santa Clara, CA, USA). The acquired data (mass signals) were compared with all of the putative protein sequences of PaP1 using Mill proteomics software (Rev A.03.02.060; Agilent, Santa Clara, CA, USA) to determine genes with products corresponding to the selected protein bands.