Prophages are integrated viral forms in bacterial genomes that have been found to contribute to interstrain genetic variability. Many virulence-associated genes are reported to be prophage encoded. Present computational methods to detect prophages are either by identifying possible essential proteins such as integrases or by an extension of this technique, which involves identifying a region containing proteins similar to those occurring in prophages. These methods suffer due to the problem of low sequence similarity at the protein level, which suggests that a nucleotide based approach could be useful.
Earlier dinucleotide relative abundance (DRA) have been used to identify regions, which deviate from the neighborhood areas, in genomes. We have used the difference in the dinucleotide relative abundance (DRAD) between the bacterial and prophage DNA to aid location of DNA stretches that could be of prophage origin in bacterial genomes. Prophage sequences which deviate from bacterial regions in their dinucleotide frequencies are detected by scanning bacterial genome sequences. The method was validated using a subset of genomes with prophage data from literature reports. A web interface for prophage scan based on this method is available at http://bicmku.in:8082/prophagedb/dra.html. Two hundred bacterial genomes which do not have annotated prophages have been scanned for prophage regions using this method.
The relative dinucleotide distribution difference helps detect prophage regions in genome sequences. The usefulness of this method is seen in the identification of 461 highly probable loci pertaining to prophages which have not been annotated so earlier. This work emphasizes the need to extend the efforts to detect and annotate prophage elements in genome sequences.
Citation: Srividhya KV, Alaguraj V, Poornima G, Kumar D, Singh GP, Raghavenderan L, et al. (2007) Identification of Prophages in Bacterial Genomes by Dinucleotide Relative Abundance Difference. PLoS ONE 2(11): e1193. doi:10.1371/journal.pone.0001193
Academic Editor: Joel Sussman, Weizmann Institute of Science, Israel
Received: February 20, 2007; Accepted: October 27, 2007; Published: November 21, 2007
Copyright: © 2007 Srividhya et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Bioinformatics facilities provided by Dept of Biotechnology, Govt of India under CoE. CSIR for fellowship to PM, KVS, AVSKKM,UGC for fellowship to VA. The funding organisation had no role in the design and conduct of the study; collection, analysis, interpretation of data; and in the preparation, review or approval of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Bacterial genomes evolve through a variety of process including horizontal gene transfer to survive under selective pressures exerted by the environment . Internal modifications of genome by intergenomic homologous recombination and horizontal gene transfer (HGT) (intragenic recombination) have been prime reasons for bacterial genome diversity . Mobile elements are responsible for the transfer of new functions to a bacterial cell and are recognized as important agents in bacterial evolution .
Bacteriophages (phage) are intracellular parasites that infect bacteria. Lytic phages upon infecting a cell, reproduce, lyse the cell and release progeny phages. However lysogenic or temperate phages multiply via the lytic cycle or enter a quiescent state in the cell. Prophages comprise of such DNA from phages in the integrated state. Fully functional prophages are capable of excision from the bacterial chromosome, either spontaneously or in response to specific signals particularly arising from damage to the host DNA. These lyse the host cells at some subsequent generation upon induction . Prophages can also be defective (in a state of mutational decay and not induced to lytic growth) or be satellites (not carrying their own structural protein genes but capable of encapsidation by capsid proteins of other virions) .
Prophages can affect the fitness of the bacteria to survive. These, as elaborated by Brussow et al., 2004  include (i) lysogenic conversion (ii) genome rearrangements, (iii) gene disruption, (iv) protection from lytic infection, (v) lysis of competing strains and (vi) introduction of new fitness factors (lysogenic conversion, transduction). Prophage–bacterial interaction has also been looked at from an ecological perspective by Chibani-Chennoufi et al., 2004 . Such interaction becomes an essential survival strategy for both the prophage and the bacteria.
Prophages can constitute as much as 10–20% of a bacterium's genome and contribute to interstrain variability. The most extreme case is currently represented by the food pathogen Escherichia coli O157:H7 strain Sakai contains 18 prophage elements which amount to 16% of its total genome content , . Many of these prophages are cryptic and in a state of mutational decay. Around 230 prophages are reported in 51 genomes . Bacteriophages and prophages are major contributors of diversification in microbes . The impact of prophages on bacterial chromosomes has been reviewed extensively  and it is seen that prophages are key agents for lateral gene transfer .
Prophages harbor virulence factors and pathogenicity islands, thereby playing an important role in the emergence of pathogens , . This was recognized for diphtheria toxins and botulinum toxins, which are phage encoded. Virulence factor pertaining to prophage loci include toxins, pili (fimbriae), adhesins and secretion systems . The CTXphi prophage of Vibrio cholerae encodes pathogenicity islands which it transfers into Vibrio mimicus . It has been pointed out that gain of virulence is not the only mechanism by which pathogenicity develops , . In the prophage database (http://bicmku.in:8082) around 15 prophages are seen to encode virulence factors including toxin and adhesins, which contribute to pathogenicity in microbes .
Prokaryotic genomes and associated fitness islands
Genomic islands increase the fitness of the bacterium. Such fitness islands are classified into several subtypes, such as ecological islands, saprophytic islands etc., based on their niche. These islands contribute to the host survival in the given environment. In many cases the fitness factor temporarily or permanently resides in the host either providing some benefits (‘Symbiosis islands’) or cause damage (pathogenicity islands (PAIs)) by interacting with living hosts. This flexible gene pool of bacteria is composed of prophages and other mobile elements or regions contrary to the core gene pool which comprises of the chromosomal segments pertaining to bacterial metabolic functions . Pathogenicity islands are being explored quite frequently to understand disease development and evolution of bacterial pathogenesis . The role of pathogenicity islands in the microbial evolution has been subject to extensive review , . Yoon et al 2005  have looked at 148 prokaryotic sequences and identified 77 candidate PAI's by applying a homology based method combined with abnormalities detected in genomic composition. Interestingly the same aspect could be looked at for understanding the evolution of eukaryotes by analyzing regions which deviate from the template DNA signature .
As reported by Brussow et al., 2004 , prophages harbor morons (more DNA), which provide extra fitness to the organism and are retained, imparting the bacterial host with some unique phenotype. Virulence factors have also been associated with prophages . A database of bacterial virulence factors (VFs) associated with various medically significant bacterial pathogens is available. VFDB summarizes the conventional VFs (toxins, enzymes, cell-surface structures, such as capsular polysaccharides, lipopolysaccharides and outer membrane proteins, secretion machineries, siderophores, catalases, regulators) which directly or indirectly regulate pathogenesis in 16 important bacterial pathogens . The mechanism of bacterial pathogenicity mediated by above VFs has been extensively studied by Wilson et al .
Detection of genome heterogeneity
Heterogeneity in genomes is represented in many ways. Some of these include local and global variations in GC content, direct and inverted repeats, oligonucleotide relative abundance, genome mosaicism due to HGT, transposition and recombination events. Methods have been developed to identify potential foreign gene acquired by the bacterial genomes through horizontal gene transfer. A direct experimental method is subtractive hybridization. Comprehensive assessment of the extent of lateral gene transfer can be made easily by genomic subtraction, a procedure to enrich sequences that are present in one genome but not in another by using biotinylated subtractor DNA to fish out the target DNA by hybrid formation. Later after several cycles of hybridization with newly added subtractor DNA removes target DNA with sequences present in both target and subtracter strains. The remaining unbound target DNA is enriched in sequences absent in the subtracter DNA. This has been done for detecting lateral gene transfer, for example, in four strains of Salmonella enterica . Indirect approaches include assessment of GC content, codon usage pattern and aminoacid usage , and dinucleotide relative abundance . For example, HGT-DB is a repository of all the prokaryotic HGTs detected based on their deviation in G+C content, codon and amino-acid usage from prokaryotic complete genomes . Genome heterogeneity in terms of short oligonucleotide compositional extremes and dinucleotide relative abundance distances between different parts of genomes have been examined by Karlin et al., 1994 . This method focuses on small DNA sequences as an alternative to whole genome comparison methods and provides a meaningful measure of similarities. It has been observed that the dinucleotide relative abundance signature could discriminate local structure specificity more than sequence specificity. Dinucleotide relative abundance values are regarded as a stable property of DNA of an organism . The method has been applied to phage genomes to understand similarities and dissimilarities associated with them. Compositional biases prevalent in bacterial genomes have also been examined by oligonucleotide distribution . The significance of dinucleotide signatures in genome heterogeneity has been extensively reviewed by Karlin et al 1997  in three facets namely, extremes of dinucleotide abundance, difference in genomic signatures in prokaryotes and evolution of genomes with respect to genomic signatures. Dinucleotide TA is seen to be under represented in eukaryotic genomes and not in viral and mitochondrial genomes. Contrarily, viral genomes are seen to be CG dinucleotide suppressed . The transposable elements of A thailana, C elegans D melanogoaster, H sapiens, S cerevisiae display a similar pattern of relative abundance of dinucleotides in comparison with their respective host genomes . This principle was extended over to prophage loci detection in microbial genomes.
Prophage Identification methods in prokaryotic genomes
Recognizing prophages in bacterial genome sequences is not a straight-forward task as prophage sequences are mosaic and encode many orphan and hypothetical proteins, hence unambiguous identification is difficult. Extensive work has been done for detecting ‘corner stone genes’ for the purpose of identifying prophages in bacterial genomes. Integrases are usually sufficiently conserved to be recognizable. Although most temperate phages have an integrase gene, it is not a necessary and sufficient condition to prove the existence of a prophage . Prophages do harbor some phage virion assembly proteins such as Terminase, Portal protein, Head maturation protease, Coat protein, Tail tape measure protein.
A comprehensive bioinformatic analysis was earlier carried out for the e14 cryptic prophage sequence . This showed that the e14 is modular and shares a large part of its sequence with Shigella flexneri phage SfV . Based on this similarity, the regulatory region including the repressor and Cro proteins and their promoter binding sites were identified. A protein based comparative approach using the COG database as a starting point was carried out to detect new lambdoid prophage like elements in a set of completely sequenced genomes . This protein similarity approach (PSA) was extended by the use of BLAST similarity searches rather than limiting to the COG database , . The PSA method was tested with bacterial genomes having known reports of prophages and then extended to newly sequenced bacteria. A total of 87 prophage loci could be identified from 61 bacteria , . Bose and Barber 2006  have implemented prophage loci prediction tool for prokaryotic genome sequences based on BLASTX sequence comparison against phage proteomes. Subsequently, a heuristic automated program proposed by Fouts 2006  for prophage detection enables multiple curation of identified prophage locus by comparison with HMMs of phage proteins and further facilitates sub classification of the identified locus.
Dinucleotide Relative abundance (DRA) approach takes into account the local heterogeneity within the given bacterial genomes. DRA values are reported to remain relatively uniform within a genome and its closely related organisms. On this basis, the collection of sixteen DRA values has been referred to as a genomic signature. Thus local heterogeneity in DRA values has been used to detect alien regions in bacterial genomes . This method has also been applied to phage genomes to understand similarities and dissimilarities associated with them . We have modified this approach to detect prophages in bacterial genomes. Putative prophage regions could be identified by finding local regions of bacterial genomes that show significant deviation in dinucleotide abundance relative to the background. However, these regions should also show similar dinucleotide abundance relative to that of a reference set of non redundant prophage sequences relevant for those bacteria. Hence taking a dinucleotide relative abundance difference (DRAD), with reference to the two cases described, improves the ability to detect the deviant regions. Since not all the dinucleotides show variation, an appropriate selection helps to further increase the discrimination of the prophage regions.
Results and Discussion
A program to detect prophage regions (both functional and prophage remnants or highly defective prophages) was developed based on comparison of DRAD analysis. From a total of 52 genomes, 325 probable prophage loci could be identified. Of these 95 prophage loci were earlier reported in literature (Table 1). The rest 230 were newly identified loci among which 159 were highly probable loci. Details are available at http://bicmku.in:8082/prophagedb/newprophages.html.
The sensitivity and specificity of the method was found to average around 82% and 83% respectively (Table 2) but however varied amongst different genomes. Our analysis suggests that the variation is not related to the GC content. The variation is possibly related to the non redundant nature of the prophage set used for the detection.
A comparison between the prophages identified by our method, those reported by Casjens  and a method phage_finder  shows a common overlap of 47 prophages (Figure 1 and Figure 2). The details on the prophage loci reported by different methods are given at http://bicmku.in:8082/prophagedb/prophage_different_methods.htm. The detection of prophages varies between different genomes suggesting that it would be necessary to use more than one method depending on the genome in order to locate all possible prophages. This probably arises from the mosaic nature of prophages.
– Indicated in green are prophages identified by the method reported here (DRAD), yellow and red represents prophage loci reported in literature  , identified by phage_finder program  respectively.
Bacterial genomes with no earlier report of prophages
The DRAD method was used to examine genome sequences with no reports of prophages. A total of 200 genome sequences were analyzed for prophage elements using this DRAD approach. Out of the 453 loci identified from 84 bacterial genomes, 207 (from 64 genomes) were seen to be highly probable prophage loci, based on the annotation in the protein table files of the corresponding bacterial genomes. The genome of Shigella sonnei had high incidence of thirteen prophages (Figure 3) http://bicmku.in:8082/prophagedb/patho_prophages.html.
pink-Shigella sonnei genome Vs Shigella sonnei genome, blue-Shigella sonnei genome Vs prophage dataset , yellow- their dinucleotide relative abundance difference (DRAD) value.
Prophages in bacterial genomes with varied ecological niche
The acquisition of ecological islands by the bacterial host occurs through horizontal gene transfer . A total of 96 prophage loci could be identified form 35 bacterial genomes (Table 3) which grow in extreme ecological niches or are being exploited for industrial production. The detailed loci of the prophages are available at http://bicmku.in:8082/prophagedb/eco_prophages.html.
Pathogenicity islands and prophages
The role of bacteriophages contributing to pathogenicity has been reviewed by Tinsley et al., 2006 . Prophage loci are seen to encode pathogenicity islands. This study showed that in the 29 pathogenic bacterial genomes screened (Table 4), 207 prophage loci were identified. Of these, 111 were seen to encode virulence or fitness factors. Details of the loci are available at http://bicmku.in:8082/prophagedb/patho_prophages.html. The observations suggest that acquisition of virulence genes through horizontally transferred prophages could be a common strategy of microbes undergoing transformation from a commensal to a pathogen. With the availability of bacterial genomes sequences, it is evident that inter-species transmission of genetic information is pervasive in microbes and that parallely acquisition of foreign genes is counter balanced by loss of native genes, in order to maintain genome size within limits.
The DRAD analysis carried out with Bacillus anthracis showed two prophage loci that encode morons (glucosyl transferase). This supplements the report of four prophages being associated in B anthracis by Sozhamannan et al., 2006  . Erwinia carotovora subsp. atroseptica is an important bacterial plant pathogen causing soft rot and blackleg in potato. As a member of the Enterobacteriaceae, it is related to Escherichia and Shigella, Salmonella and Yersinia . In this study, Erwinia was found to harbor a total 7 prophages encoding Type IV pilus protein and flagellar proteins. Similarly, in the pathogenic H pylori genome, the DRAD analysis identified prophage loci that encode Cag island proteins which pertain to pathogenicity . The same Cag island has been reported by Yoon et al., 2005  as potential PAI. Moreover, in Chromobacterium violaceum ATCC 12472 , Bordetella pertussis Tohama I, Helicobacter pylori J99, Photorhabdus luminescens TT01 Vibrio parahaemolyticus RIMD 2210633 (Table 4) the prophage loci identified by DRAD compare well with the PAIs reported by Yoon et al., 2005 .
In the case of Mycobacterium avium the prophage region detected by DRAD was found to encode MurA, which has been implicated in M. tuberculosis resistance to a range of broad-spectrum antimicrobial agents . With Mycobacterium bovis out of three prophages that were detected one was found to harbor PE-PGRS genes, which are a family encoding numerous repetitive glycine-rich proteins of unknown function . PE-PGRS proteins are reported to be associated with mycobacterial species (M. tuberculosis, M. bovis BCG, M. smegmatis, M. marinum and M. gordonae) and 11 clinical isolates of M. tuberculosis . This again highlights the possible contribution of prophages to the virulence of the associated bacterial species.
Salmonella enterica subsp. enterica serovar Choleraesuis is a highly invasive serovar among non-typhoidal Salmonella that usually causes sepsis or extra-intestinal focal infections in humans . The DRAD analysis of the bacterial genome showed a high incidence of prophages. The loci identified encode Gifsy-2 and Gifsy-1 prophage like proteins. Most of loci encode a few to many fimbrial proteins, surface presentation antigens and secretion system apparatus which are key genes involved in virulence. In the case of Salmonella enterica Paratyphi, a human-restricted serovars of Salmonella enterica causing typhoid , nine prophage loci could be identified and these predominantly encode pathogenicity islands apart form secretion systems.
Maurelli et al 1998  have reported the role of genomic deletion (of LCD- lysine decarboxylase) contributing to the pathogenicity of Shigella spp. Among Shigella species, S sonnei involved in mucoid diarrhea, 13 highly probable prophage loci could be detected. With all the three species of Shigella (S. sonnei, S.boydii and S.dysenteriae) almost all the loci are associated with insertion sequence elements, from a minimum of 3 to 10. A few of the possible prophage loci are seen to harbor virulence factors like siderophores. In Vibrio parahaemolyticus, the two prophage loci that have been detected (Table 4) encode pilus assembly protein and restriction proteins. Recently, horizontal gene transfer of CTXphi prophage encoded PAIs have been reported between V mimicus and V cholerae  indicating that the Vibrios share such virulence associated gene pools.
Prophages, including defective ones, can contribute important biological properties to their bacterial hosts. In order to understand completely the nature of the bacterial behavior, one must be able to recognize the full complement of prophages in bacterial genomes. The extreme variability of prophage sequences, as seen by our comparisons, makes it quite possible that unrecognized prophages are still present in bacterial genome sequences (Casjens, 2003) .We have presented a dinucleotide distribution difference method for identification of prophages from microbial genomes sequences. Prophage detection methods such as the one described here based on dinucleotide composition and those earlier reported based on similarity at the protein level tend to supplement each other. With increasing microbial genome sequences being available, consensus methods will probably emerge for identifying potential prophage loci in microbial genomes. These will help explain the prophage mediated evolution of microbes.
Materials and Methods
The Dinucleotide Relative Abundance (DRA)  was modified for prophage detection.
For a given dinucleotide XY,(1)where obsfXY is the observed frequency of the dinucleotide XY occurring in a chosen window and expfXY is the expected frequency of the nucleotide XY occurring in the reference set.(2)DRAbact is calculated using the observed dinucleotide frequencies for a window of the bacterial genome and the expected frequencies of the dinucleotide occurring over the entire bacterial genome. The DRAbact values using a sliding window are calculated for the entire genome and plotted against the bacterial genome sequence position. DRAprophage is calculated using the observed dinucleotide frequencies for a window of the bacterial genome and the expected frequencies of the dinucleotide occurring over the entire prophage reference set. The DRAprophage values using a sliding window are calculated for the entire genome and plotted against the bacterial genome sequence position.(3)The DRAD or DRAdiff is calculated for each window and plotted against the bacterial genome sequence position. Regions of high DRAdiff values are used to identify possible prophage-like regions. By trial and error, using known prophage regions, a window size of 25000 with a displacement of 1000 was standardized for the screening. Further the hit was annotated as a potential prophage locus and taken as a true positive if the annotation in protein table (ptt) file for the locus had phage associated genes. Those regions without any phage marker genes were considered as false positives. The annotations of peak locus (corresponding to each prophage) were retrieved from protein table file (ptt) of respective bacterial genomes. False negatives includes prophage set not detected by DRA but reported in literature.
The probable specificity (ratio of true positives to the sum of true positives and false positives) and probable sensitivity (ratio of true positives to the sum of true positives and false negatives) were calculated according to Makarov 2002 . The qualifier probable has been added to the specificity and sensitivity measures since the assumption that the data used for validation is complete is not wholly appropriate, as there could be prophages that are yet to be detected. A server for the detection of prophages based on comparison of Dinucleotide Relative Abundance Difference (DRAD or DRAdiff) values is available at http://bicmku.in:8082/prophagedb/dra.html.
Bacteria genomes were downloaded from NCBI ftp site (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/). Prophage positions and sequences obtained from supplementary material of Casjens, (2003)  are available in the prophage database (http://bicmku.in:8082/prophagedb, Srividhya et al 2006) . Location of prophages in bacterial genomes was determined by using protein table file (ptt) from NCBI.
Construction of Non-redundant Prophage set (NRPS)
For detection of new prophages in bacterial genomes a set of non redundant prophages was constructed, which includes prophages (without repetition) from 50 bacterial genomes from the prophage database (http://bicmku.in:8082). This constitutes the NRPS (non-redundant prophage set) which was used for screening for prophages in any given bacterial genome. The list of prophages taken for NRPS generation is listed in http://bicmku.in:8082/prophagedb/nrlist.html.
Conceived and designed the experiments: SK PM. Performed the experiments: KS VA GP LR. Analyzed the data: SK KS GP LR DK. Contributed reagents/materials/analysis tools: VA GS PM DK AM. Wrote the paper: SK KS.
- 1. Arber W (2000) Genetic Variation molecular mechanisms and impact on microbial evolution. FEMS Microbiol 24: 1–7.
- 2. Chitra D, Archana P (2002) Horizontal gene transfer and bacterial diversity. J Biosci 27: 27–33.
- 3. Tinsley CR, Bille E, Nassif X (2006) Bacteriophages and pathogenicity: more than just providing a toxin? Microbes Infect 8: 1365–1371.
- 4. Campbell A (2001) Lysogeny from Encyclopedia of life sciences,1–6.
- 5. Casjens S (2003) Prophages and bacterial genomics: what have we learned so far? Mol Microbiol 49: 277–300.
- 6. Brussow H, Canchaya C, Hardt WD (2004) Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion. Microbiol Mol Biol Rev 68: 560–602.
- 7. Chibani-Chennoufi S, Bruttin A, Dillmann ML, Brussow H (2004) Phage-host interaction: an ecological perspective. J Bacteriol 186: 3677–3686.
- 8. Canchaya C, Proux C, Fournous G, Bruttin A, Brussow H (2003) Prophage genomics. Microbiol Mol Biol Rev 67: 238–276.
- 9. Ohnishi M, Kurokawa K, Hayashi T (2001) Diversification of Escherichia coli genomes: are bacteriophages the major contributors? Trends Microbiol 9: 481–485.
- 10. Hendrix RW, Smith MC, Burns RN, Ford ME, Hatfull GF (1999) Evolutionary relationships among diverse bacteriophages and prophages: all the world's a phage. Proc Natl Acad Sci U S A 96: 2192–2197.
- 11. Canchaya C, Fournous G, Brussow H (2004) The impact of prophages on bacterial chromosomes. Mol Microbiol 53: 9–18.
- 12. Canchaya C, Fournous G, Chibani-Chennoufi S, Dillmann ML, Brussow H (2003) Phage as agents of lateral gene transfer. Curr Opin Microbiol 6: 417–424.
- 13. Wagner PL, Waldor MK (2002) Bacteriophage control of bacterial virulence. Infect Immun 70: 3985–3993.
- 14. Boyd EF, Davis BM, Hochhut B (2001) Bacteriophage-bacteriophage interactions in the evolution of pathogenic bacteria. Trends Microbiol 9: 137–144.
- 15. Li M, Kotetishvili M, Chen Y, Sozhamannan S (2003) Comparative genomic analyses of the vibrio pathogenicity island and cholera toxin prophage regions in nonepidemic serogroup strains of Vibrio cholerae. Appl Environ Microbiol 69: 1728–1738.
- 16. Boyd EF, Heilpern AJ, Waldor MK (2000) Molecular analyses of a putative CTXphi precursor and evidence for independent acquisition of distinct CTX(phi)s by toxigenic Vibrio cholerae. J Bacteriol 182: 5530–5538.
- 17. Wilson JW, Schurr MJ, LeBlanc CL, Ramamurthy R, Buchanan KL, Nickerson CA (2002) Mechanisms of bacterial pathogenicity. Postgrad Med J 78: 216–224.
- 18. Hacker J, Carniel E (2001) Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes. EMBO Rep 2: 376–381.
- 19. Schmidt H, Hensel M (2004) Pathogenicity islands in bacterial pathogenesis. Clin Microbiol Rev 17: 14–56.
- 20. Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol 23: 1089–1097.
- 21. Smith J (2001) The social evolution of bacterial pathogenesis. Proc Biol Sci 268: 61–69.
- 22. Yoon SH, Hur CG, Kang HY, Kim YH, Oh TK, Kim JF (2005) A computational approach for identifying pathogenicity islands in prokaryotic genomes. BMC Bioinformatics 6: 184.
- 23. Chen L, Yang J, Yu J, Yao Z, Sun L, Shen Y, Jin Q (2005) VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res 33: D325–D328.
- 24. Lan R, Reeves PR (1996) Gene Transfer is major factor in gene evolution. Mol Biol Evol 13: 47–55.
- 25. Karlin S (1998) Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol 1: 598–610.
- 26. Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11: 283–290.
- 27. Garcia-Vallve S, Guzman E, Montero MA, Romeu A (2003) HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res 31: 187–189.
- 28. Karlin S, Ladunga I, Blaisdell BE (1994) Heterogeneity of genomes: measures and values. Proc Natl Acad Sci U S A 91: 12837–12841.
- 29. Blaisdell BE, Campbell AM, Karlin S (1996) Similarities and dissimilarities of phage genomes. Proc Natl Acad Sci U S A 93: 5854–5859.
- 30. Karlin S, Mrazek J, Campbell AM (1997) Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol 179: 3899–3913.
- 31. Lerat E, Capy P, Biemont C (2002) The relative abundance of dinucleotides in transposable elements in five species. Mol Biol Evol 19: 964–967.
- 32. Mehta P, Casjens S, Krishnaswamy S (2004) Analysis of the lambdoid prophage element e14 in the E. coli K-12 genome. BMC Microbiol 4: 4.
- 33. Rao GV, Mehta P, Srividhya KV, Krishnaswamy S (2005) A protein similarity approach for detecting prophage regions in bacterial genomes. Genome Biology 6: p11.
- 34. Srividhya KV, Rao GeetaV, Raghavenderan L, Mehta Preeti, Pirulsky Jaime, Manicka Sankarnarayanan, Sussman JoelL, Krishnaswamy S (2006) Database and Comparative Identification of prophages. LNCIS 344: 863–868.
- 35. Bose M, Barber RD (2006) Prophage Finder: a prophage loci prediction tool for prokaryotic genome sequences. In Silico Biol 6: 223–227.
- 36. Fouts DE (2006) Phage_Finder: automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res 34: 5839–5851.
- 37. Sozhamannan S, Chute MD, McAfee FD, Fouts DE, Akmal A, Galloway DR, Mateczun A, Baillie LW, Read TD (2006) The Bacillus anthracis chromosome contains four conserved, excision-proficient, putative prophages. BMC Microbiol 6: 34.
- 38. Bell , et al. (2004) Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. Atroseptica and characterization of virulence factors Proc Natl Acad Sci 101: 11105–11110.
- 39. Covacci A, Falkow S, Berg DE, Rappuoli R (1997) Did the inheritance of a pathogenicity island modify the virulence of Helicobacter pylori? Trends Microbiol 5: 205–208.
- 40. Koen AL, Smet D, Kempsell K, Gallagher A, Duncan K, Young D (1999) Alteration of a single amino acid residue reverses fosfomycin resistance of recombinant MurA from Mycobacterium tuberculosis Microbiology 145: 3177–3184.
- 41. Ramakrishnan L, Federspiel NA, Falkow S (2000) Granuloma-specific expression of Mycobacterium virulence proteins from the glycine-rich PE-PGRS family. Science 288: 1436–1439.
- 42. Banu S, Honore N, Saint-Joanis B, Philpott D, Prevost MC, Cole ST (2002) Are the PE PGRS proteins of Mycobacterium tuberculosis variable surface antigens?, Mol Microbiol., 44: 9–19.
- 43. Chiu CH, Tang P, Chu C, Hu S, Bao Q, Yu J, Chou YY, Wang HS, Lee YS (2005) The genome sequence of Salmonella enterica serovar Choleraesuis, a highly invasive and resistant zoonotic pathogen. Nucleic Acids Res 33: 1690–1698.
- 44. McClelland M, et al. (2004) Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid. Nat Genet 36: 1268–1274.
- 45. Maurelli AT, Fernandez RE, Bloch CA, Rode CK, Fasano A (1998) “Black holes” and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli. Proc Natl Acad Sci U S A 95: 3943–3948.
- 46. Boyd EF, Moyer KE, Shi L, Waldor MK (2000) Infectious CTXPhi and the vibrio pathogenicity island prophage in Vibrio mimicus: evidence for recent horizontal transfer between V. mimicus and V. cholerae. Infect Immun 68: 1507–1513.
- 47. Makarov V (2002) Computer programs for eukaryotic gene prediction. Briefings in Bioinformatics 3: 195–199.