Figures
Abstract
Salmonella enterica Serovar Typhimurium (Salmonella) and its bacteriophage P22 are a model system for the study of horizontal gene transfer by generalized transduction. Typically, the P22 DNA packaging machinery initiates packaging when a short sequence of DNA, known as the pac site, is recognized on the P22 genome. However, sequences similar to the pac site in the host genome, called pseudo-pac sites, lead to erroneous packaging and subsequent generalized transduction of Salmonella DNA. While the general genomic locations of the Salmonella pseudo-pac sites are known, the sequences themselves have not been determined. We used visualization of P22 sequencing reads mapped to host Salmonella genomes to define regions of generalized transduction initiation and the likely locations of pseudo-pac sites. We searched each genome region for the sequence with the highest similarity to the P22 pac site and aligned the resulting sequences. We built a regular expression (sequence match pattern) from the alignment and used it to search the genomes of two P22-susceptible Salmonella strains—LT2 and 14028S—for sequence matches. The final regular expression successfully identified pseudo-pac sites in both LT2 and 14028S that correspond with generalized transduction initiation sites in mapped read coverages. The pseudo-pac site sequences identified in this study can be used to predict locations of generalized transduction in other P22-susceptible hosts or to initiate generalized transduction at specific locations in P22-susceptible hosts with genetic engineering. Furthermore, the bioinformatics approach used to identify the Salmonella pseudo-pac sites in this study could be applied to other phage—host systems.
Author summary
Bacteriophage P22 has been a genetic tool and a key model for the study of generalized transduction in Salmonella since the 1950s, yet certain components of the generalized transduction molecular mechanism remain unknown. Specifically, the locations and sequences of pseudo-pac sites, hypothesized to facilitate packaging of Salmonella DNA by P22, have not been determined to date. In this study, we identified the specific locations and sequences of the pseudo-pac sites frequently recognized by P22 in Salmonella genomes. The identification of highly efficient pseudo-pac sites in Salmonella helps us understand the sequence specificity necessary for P22 pac site recognition and paves the way for more targeted use of generalized transduction with P22.
Citation: Maier JL, Gin C, Callahan B, Sheriff EK, Duerkop BA, Kleiner M (2024) Pseudo-pac site sequences used by phage P22 in generalized transduction of Salmonella. PLoS Pathog 20(6): e1012301. https://doi.org/10.1371/journal.ppat.1012301
Editor: Patrick Secor, University of Montana, UNITED STATES
Received: April 3, 2024; Accepted: May 29, 2024; Published: June 24, 2024
Copyright: © 2024 Maier et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: P22 sequencing reads used for LT2 read mapping were generated in Kleiner et al. (2020) https://doi.org/10.1186/s40168-020-00935-5 and are available at ENA (Study: PRJEB6941, Sample: SAMEA2690949). P22 sequencing reads used for 14028S read mapping are available at ENA (Study: PRJEB72417, Sample: SAMEA115180785). The reference genomes used for LT2 and 14028S read mapping are from NCBI RefSeq NC_003197.2 and NC_016856.1, respectively.
Funding: This work was supported by funding from the NC State University Data Science Academy and by the National Institute of General Medical Sciences and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Number R35GM138362 (MK) and R01AI141479 (BAD). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or NCSU Data Science Academy. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. NCSU DSA: https://datascienceacademy.ncsu.edu/seed-grants/ NIGMS: https://www.nigms.nih.gov/ NIAID: https://www.niaid.nih.gov/ NIH: https://www.nih.gov/.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Transduction, the transfer of DNA between bacterial cells by bacteriophages, can lead to horizontal gene transfer of entire operons of genetic material and can cause dramatic changes in bacterial phenotypes [1–4]. Generalized transduction, one of several potential modes of transduction, was first discovered in 1952 in the bacteriophage P22 and Salmonella enterica Serovar Typhimurium LT2 (LT2), thus making these a model system for generalized transduction [5,6]. Generalized transduction was initially thought to be a random transfer of host DNA, but when the frequencies of transduced LT2 gene markers were quantified, it became clear that transduction frequencies differed widely across the genome [7–9]. Similar transduction locations and frequencies were seen in 2 recent studies that used mapped P22 DNA sequencing reads to visualize transduction patterns in LT2. These studies demonstrate that the P22-facilitated generalized transduction of LT2 is nonrandom and consistent between methods and experiments [10,11]. The observed pattern consists of sharp increases in read coverage followed by sloping decreases of coverage across several regions of the LT2 genome (Fig 1A).
(A) Coverage plot of the Salmonella enterica Serovar Typhimurium LT2 (LT2) genome with sequencing reads from purified P22. Black vertical lines indicate the eight initiation sites for generalized transduction and the locations of the associated pseudo-pac sites along the LT2 genome. The exact locations of the pseudo-pac site sequences identified in this study are highlighted in pink on subsets of the LT2 read coverage plot associated with each site. The sequence of the pseudo-pac site candidate associated with each site is displayed above its respective read coverage plot. (B) A multiple sequence alignment (MSA) generated with ClustalW of the 8 pseudo-pac site candidate sequences, the P22 pac site and the respective neighboring genome sequences. The location of the 12-bp pac site consensus region, identified in Casjens and colleagues [12] and further characterized by Wu and colleagues [13], is defined above the MSA. The regular expression pattern built from the pseudo-pac site candidate sequences is displayed below the MSA.
P22 uses a headful packaging mechanism in which a short sequence of DNA, known as the pac site, is recognized prior to packaging initiation. After initiation, the P22 DNA packaging machinery packages several capsids in series using the same concatemer of DNA on which the pac site is located [14–16]. Generalized transduction by P22 occurs when its small terminase, responsible for pac site recognition [17,18], recognizes a sequence in the host genome that is similar to the phage’s pac site (i.e., pseudo-pac site), leading to initiation of packaging on the bacterial chromosome [9,18–20]. The locations of pseudo-pac sites along a bacterial chromosome lead to the nonrandom generalized transduction patterns observed between P22 and LT2. Despite P22’s pac site sequence having been previously described by Wu and colleagues [13], the exact pseudo-pac site sequences and locations remained unknown.
Results
Identifying pseudo-pac site candidate sequences
We used sequencing reads from ultra-purified P22 propagated on LT2 and mapped them back to the LT2 genome to identify the locations where pseudo-pac site facilitated packaging of the LT2 genome occurred. The regions where packaging is initiated are characterized by a sudden, sharp increase in read coverage (Fig 1A). We visually identified 8 sites that matched this profile and extracted 120 base pair (bp) regions of the LT2 genome surrounding these sites (Fig 1A). We chose 120 bp because the P22 packaging machinery makes its packaging initiation cuts in a 120 bp region surrounding its pac site [17]. We searched each of the eight 120 bp regions for the sequence that best matched the 12 bp P22 consensus pac site sequence—5′ AAGATTTATCTG 3′—identified in Casjens and colleagues [12] and further characterized by Wu and colleagues [13] using P22 mutants. For generalized transduction events whose read coverage patterns sloped left to right across the LT2 genome (sites 3, 4, 5, 7, and 8), we searched the forward strand of the genome and for events that sloped right to left (sites 1, 2, and 6), we searched the reverse strand. The 8 sequences that best matched the P22 pac site in each of the 120 bp regions (Fig 1A) are henceforth referred to as pseudo-pac site candidates. We performed a multiple sequence alignment (MSA) of the pseudo-pac site candidates and neighboring genome regions which revealed that sequence conservation between the candidates extended beyond the 12 bp pseudo-pac site consensus region (Fig 1B). We adjusted the consensus region to include all strongly conserved regions of the MSA accordingly.
Confirming accuracy of the pseudo-pac site consensus sequence
To determine if the consensus sequence obtained using the MSA was accurate, we used it to “scan” the genome of Salmonella enterica Serovar Typhimurium 14028S (14028S) for matches. Strain 14028S is susceptible to P22 infection and based on read mapping, we determined that 14028S shares the same generalized transduction sites and associated candidate pseudo-pac site sequences as LT2. However, when 14028S is infected with P22, we observed 1 additional generalized transduction site located on the reverse strand around 1.3 Mbp in the mapped read coverages. The additional site was seemingly not present in LT2 when infected with P22 (Fig 2A). We hypothesized that if our pseudo-pac site consensus sequence was correct, it could be used to identify the pseudo-pac site present at the additional generalized transduction site in 14028S. We built a regular expression (sequence match pattern)—5′ AAG[AG][TC][AT][AT][ATC][TC][TC]T[GT][ACG][ACG][ACG]TC 3′—that represents the bases observed for each position of the MSA. This regular expression specifies that any matching sequence will have an AAG in the first 3 positions, the fourth position must be either A or G, the fifth position must be either T or C, and so on. Five new sequences were identified in both LT2 and 14028S using the regular expression, two of which, located on the forward strand around 2.5 and 2.8 Mbp, were associated with right to left sloping read coverages immediately following the match location. These sequence matches likely represent 2 additional pseudo-pac sites in the LT2 and 14028S genomes (sites 9 and 10, respectively, in Fig 2C) which were not identified in our initial visual screen as they are not as prominent in LT2 compared to 14028S.
(A–C) Coverage plots of the Salmonella enterica Serovar Typhimurium LT2 (LT2) and Salmonella enterica Serovar Typhimurium 14028S (14028S) genomes with sequencing reads from purified P22. The additional generalized transduction site present in 14028S but not LT2 is shaded in grey. The regular expression (Regex) patterns used to search the Salmonella genomes for additional pseudo-pac sites are displayed above their associated plots. Black vertical lines indicate the locations of the pseudo-pac site sequences that were previously identified in this study. Orange and pink dashed lines indicate the locations of regex matches on the forward and reverse Salmonella genome strands, respectively. The additional pseudo-pac sites identified with the regex searches, sites 9–11, are indicated in dashed boxes below their respective pattern match locations. (D) A summary table with information about each pseudo-pac site identified in this study numbered in order of detection. Sites 1–8 were initially visually identified using P22 read coverages mapped to the Salmonella genome. Sites 9–11 were identified using the regex searches and were confirmed visually with read coverages.
Despite the newly identified sites, the additional generalized transduction site present in 14028S was not identified by the regular expression which indicated an error in the consensus sequence. After testing various changes to the regular expression, we ultimately found that changing the conserved G in position three to G or C enabled the identification of the additional pseudo-pac site in 14028S (Fig 2C) with minimal false positives. While we only tested a relatively small number of all the possible changes to the regular expression, we found that changes to other conserved positions, like the As in the first 2 positions or the C in the last position, caused large increases in false positive matches (S1 Fig). Eight out of the 18 and 19 matches in LT2 and 14028S, respectively, to the final regular expression do not appear to be associated with large jumps and/or sloping read coverages. This could be due to the corresponding generalized transduction patterns being covered by more prominent patterns at these positions or by secondary DNA structures preventing the P22 packaging machinery from binding.
Discussion
Based on the evidence presented, we are confident that the 10 pseudo-pac site sequences identified in LT2 (Fig 2D) are the exact sequences that P22 routinely recognizes for generalized transduction. We are also confident that our final regular expression pseudo-pac site consensus sequence—5′ AA[GC][AG][TC][AT][AT][ATC][TC][TC]T[GT][ACG][ACG] [ACG]TC 3′—can identify highly efficient pseudo-pac sites in other P22-susceptible Salmonella strains, like the 11 sites identified in 14028S (Fig 2D). Our results could be further validated in vitro by genetically engineering the pseudo-pac site sequences identified in this study into a P22-susceptible host bacteria, infecting the host with P22, and sequencing the purified P22 to determine if generalized transduction was induced at the location of the inserted pseudo-pac site sequence. The identification of pseudo-pac sites in Salmonella provides fundamental insights into the sequence specificity necessary for P22 pac site recognition and opens the door to more targeted use of generalized transduction with P22.
Additionally, we hope that the methods used to identify the P22 pseudo-pac sites in LT2 and 14028S can be adapted by others to identify pseudo-pac sites used for generalized transduction in diverse phage—host systems.
Materials and methods
The ultra-purified P22 reads used for mapping against LT2 originated from previously published Illumina sequencing reads [10]. We used BBMap [21] with ambiguous = random, qtrim = lr, and minid = 0.97 for mapping and pileup.sh with stdev = t and binsize = 25 to create data frames for read coverage visualization in R using ggplot2 [22]. We created an R script (S1 Text) to search the eight 120 bp regions across the LT2 genome for matches to the P22 pac site. We performed the MSA of both the P22 pac site and the pseudo-pac site candidates including the respective neighboring genome regions with ClustalW [23]. We used a Full Multiple Alignment and Bootstrap NJ Tree with 1,000 trees for the ClustalW MSA. We created an R function to search both the forward and reverse strands of genome sequences for regular expression matches (S1 Text).
Supporting information
S1 Fig. Examples of Salmonella genome matches to other regular expression patterns.
(A–C) Coverage plots of the Salmonella enterica sv. Typhimurium LT2 (LT2) and Salmonella enterica sv. Typhimurium 14028S (14028S) genomes with sequencing reads from purified P22. The additional generalized transduction site present in 14028S but not LT2 is shaded in gray. The regular expression (Regex) patterns used to search the Salmonella genomes for additional pseudo-pac sites are displayed above their associated plots. Black vertical lines indicate the locations of the pseudo-pac site sequences that were previously identified in this study. Orange and pink dashed lines indicate the locations of regular expression matches on the forward and reverse Salmonella genome strands, respectively.
https://doi.org/10.1371/journal.ppat.1012301.s001
(TIF)
S1 Text. R code used to identify the pseudo-pac site sequences.
https://doi.org/10.1371/journal.ppat.1012301.s002
(DOCX)
References
- 1. Gozzi K, Tran NT, Modell JW, Le TBK, Laub MT. Prophage-like gene transfer agents promote Caulobacter crescentus survival and DNA repair during stationary phase. PLoS Biol. 2022 Nov 3;20(11):e3001790. pmid:36327213
- 2. Penadés JR, Chen J, Quiles-Puchalt N, Carpena N, Novick RP. Bacteriophage-mediated spread of bacterial virulence genes. Curr Opin Microbiol. 2015 Feb 1;23:171–8. pmid:25528295
- 3. Haaber J, Leisner JJ, Cohn MT, Catalan-Moreno A, Nielsen JB, Westh H, et al. Bacterial viruses enable their host to acquire antibiotic resistance genes from neighbouring cells. Nat Commun. 2016 Nov 7;7(1):13333. pmid:27819286
- 4. Fillol-Salom A, Martínez-Rubio R, Abdulrahman RF, Chen J, Davies R, Penadés JR. Phage-inducible chromosomal islands are ubiquitous within the bacterial universe. ISME J. 2018 Sep;12(9):2114–28. pmid:29875435
- 5. Zinder ND, Lederberg J. Genetic Exchange in Salmonella. J Bacteriol. 1952;64(5):679–699. pmid:12999698
- 6. Zinder ND. Bacterial transduction. J Cell Comp Physiol. 1955;45(S2):23–49. pmid:13242624
- 7. Ozeki H. Chromosome Fragments Participating in Transduction in Salmonella Typhimurium. Genetics. 1959 May;44(3):457–70. pmid:17247838
- 8.
Schmieger H, Backhaus H. Altered Cotransduction Frequencies Exhibited by I-IT-Mutants of Salmonella-PhageP22.
- 9. Schmieger H. Packaging signals for phage P22 on the chromosome of Salmonella typhimurium. Mol Gen Genet MGG. 1982 Oct 1;187(3):516–8. pmid:6757664
- 10. Kleiner M, Bushnell B, Sanderson KE, Hooper LV, Duerkop BA. Transductomics: sequencing-based detection and analysis of transduced DNA in pure cultures and microbial communities. Microbiome. 2020 Nov 15;8(1):158. pmid:33190645
- 11. Fillol-Salom A, Bacigalupe R, Humphrey S, Chiang YN, Chen J, Penadés JR. Lateral transduction is inherent to the life cycle of the archetypical Salmonella phage P22. Nat Commun. 2021 Nov 8;12(1):6510. pmid:34751192
- 12. Casjens S, Huang WM, Hayden M, Parr R. Initiation of bacteriophage P22 DNA packaging series: Analysis of a mutant that alters the DNA target specificity of the packaging apparatus. J Mol Biol. 1987 Apr 5;194(3):411–22.
- 13. Wu H, Sampson L, Parr R, Casjens S. The DNA site utilized by bacteriophage P22 for initiation of DNA packaging. Mol Microbiol. 2002;45(6):1631–1646. pmid:12354230
- 14. Jackson EN, Jackson DA, Deans RJ. EcoRI analysis of bacteriophage P22 DNA packaging. J Mol Biol. 1978 Jan 25;118(3):365–88. pmid:344888
- 15. Tye BK, Botstein D. P22 morphogenesis II: Mechanism of DNA encapsulation. J Supramol Struct. 1974;2(2–4):225–238.
- 16. Casjens S, Huang WM. Initiation of sequential packaging of bacteriophage P22 DNA. J Mol Biol. 1982 May 15;157(2):287–98. pmid:6286978
- 17. Casjens S, Sampson L, Randall S, Eppler K, Wu H, Petri JB, et al. Molecular genetic analysis of bacteriophage P22 gene 3 product, a protein involved in the initiation of headful DNA packaging. J Mol Biol. 1992 Oct 20;227(4):1086–99. pmid:1433288
- 18. Raj AS, Raj AY, Schmieger H. Phage genes involved in the formation of generalized transducing particles in Salmonella-phage P22. Mol Gen Genet. 1974 Jun 1;135(2):175–84.
- 19.
Thierauf A, Perez G, Maloy S. Generalized Transduction. In: Clokie MRJ, Kropinski AM, editors. Bacteriophages: Methods and Protocols, Volume 1: Isolation, Characterization, and Interactions [Internet]. Totowa, NJ: Humana Press; 2009 [cited 2023 Aug 2]. p. 267–86. (Methods in Molecular Biology).
- 20. Chelala CA, Margolin P. Evidence that HT mutant strains of bacteriophage P22 retain an altered form of substrate specificity in the formation of transducing particles in Salmonella typhimurium. Genet Res. 1976 Apr;27(2):315–22. pmid:776744
- 21.
Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner [Internet]. Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); 2014 Mar [cited 2024 Mar 12]. Report No.: LBNL-7065E. https://www.osti.gov/biblio/1241166.
- 22.
Wickham H. ggplot2- Elegant Graphics for Data Analysis [Internet]. Cham: Springer International Publishing; 2016 [cited 2024 Jun 10]. (Use R!). http://link.springer.com/10.1007/978-3-319-24277-4.
- 23. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–80. pmid:7984417