dndDB: A Database Focused on Phosphorothioation of the DNA Backbone

Background The Dnd DNA degradation phenotype was first observed during electrophoresis of genomic DNA from Streptomyces lividans more than 20 years ago. It was subsequently shown to be governed by the five-gene dnd cluster. Similar gene clusters have now been found to be widespread among many other distantly related bacteria. Recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequence-selective, stereo-specific phosphorothioate modification in Escherichia coli B7A. Intriguingly, to date all identified dnd clusters lie within mobile genetic elements, the vast majority in laterally transferred genomic islands. Methodology We organized available data from experimental and bioinformatics analyses about the DNA phosphorothioation phenomenon and associated documentation as a dndDB database. It contains the following detailed information: (i) Dnd phenotype; (ii) dnd gene clusters; (iii) genomic islands harbouring dnd genes; (iv) Dnd proteins and conserved domains. As of 25 December 2008, dndDB contained data corresponding to 24 bacterial species exhibiting the Dnd phenotype reported in the scientific literature. In addition, via in silico analysis, dndDB identified 26 syntenic dnd clusters from 25 species of Eubacteria and Archaea, 25 dnd-bearing genomic islands and one dnd plasmid containing 114 dnd genes. A further 397 other genes coding for proteins with varying levels of similarity to Dnd proteins were also included in dndDB. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for to individualized directions of research focused on dnd genes. Conclusion dndDB can facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification and other island-encoded functions in host organisms. dndDB version 1.0 is freely available at http://mml.sjtu.edu.cn/dndDB/.


Introduction
The Dnd DNA degradation phenotype was observed during normal and pulsed-field gel electrophoresis of genomic DNA from Streptomyces lividans strain 66 [1]. DNA degradation during electrophoresis in the presence of tris, a commonly used biological buffer, has also been reported in many other distantly related bacterial species, such as Escherichia coli, Salmonella enterica, Klebsiella pneumoniae, Vibrio parahaemolyticus, Pseudomonas aeruginosa, Pseudomonas fluorescens, Mycobacterium abscessus, Clostridium botulinum, and Clostridium difficile. The Dnd phenotye was thought to involve a postreplicative DNA modification that rendered DNA susceptible to degradation at the electrophoretic anode. In 2005, the five-gene dndABCDE cluster responsible for this phenotype was identified in S. lividans [2]. Zhou at al. [2] demonstrated that the affected DNA had been modified in vivo by the addition of a sulphur-containing molecule through a likely biochemical pathway mediated by enzymes encoded by the dnd locus.
More recently the dnd cluster was shown to mediate the incorporation of sulphur into the DNA backbone via a sequenceselective, stereo-specific phosphorothioate modification in E. coli B7A [3]. By using high-performance liquid chromatography and mass spectrometry, the chemical structure of phosphorothioated DNA was determined revealing a sulfur atom in place of one of the nonbridging oxygen atoms on a DNA backbone-borne phosphate group. To our knowledge, this was the first report of natural modification of the DNA backbone itself and sets it apart from welldocumented DNA methylation and other changes to DNA bases.
Intriguingly, the S. lividans dnd cluster lay within a large, mosaic genomic island named SLG [4,5]. To date all 26 identified dnd clusters are borne on likely mobile genetic elements, twenty-five of which are harboured on genomic islands, fragments of alien DNA that have been incorporated into chromosomes of new hosts via horizontal gene transfer events [6].
The observed Dnd phenotype and recent microbiological, genetic and biochemical advances in the field have been reported in the scientific literature. However, disparate PubMed references and individual genome annotation and protein data deposited in public databases do not provide a unified resource required to facilitate the advanced searches, analyses and data manipulation necessary to fully exploit the available and rapidly emerging new data in the Dnd field. Consequently, we have created a MySQL database, dndDB, to efficiently organize all available data from experimental and bioinformatics analyses about the phosphorothioation of DNA in Eubacteria and Archaea and provide a central repository of associated documentation. We propose that our evolving, web-based dndDB resource will stimulate and facilitate research into many key questions, including the mechanism of sulfur incorporation, the biological significance of this DNA modification, the role, source and mode of dissemination of dnd-bearing genomic islands, and the potential for exploitation of these systems for biotechnological applications.

Results and Discussion
The purpose of dndDB is to provide a user-friendly interactive platform not only to efficiently archive, analyse and manipulate increasing data about bacterial and archeal dnd genes, linked island-borne genes, matching sets of cognate proteins, and the DNA phosphorothioation process itself, but to also empower researchers from different backgrounds to explore novel angels potentially related to this, thus far, unique DNA backbone modification process. A broad range of similarity search, sequence alignment and phylogenetic tools are readily accessible to allow for user-directed interrogation of the database, Figure 1. Inferred phylogenetic relationship of the 31 bacterial and one archael organism carrying known dnd clusters (denoted by orange 'G' balls) and/or documented to exhibit the Dnd phenotype (denoted by purple 'P' balls). The tree shown was constructed on the basis of NCBI taxonomy (http://www.ncbi.nlm.nih.gov/Taxonomy/) by using iTOL [11], which is now accessible via dndDB. doi:10.1371/journal.pone.0005132.g001 examination of user-supplied sequences and other individualized directions of research.

Organisms exhibiting the Dnd phenotype
Electrophoresis-associated DNA degradation, otherwise known as the Dnd phenotype, is a puzzling and long-standing phenomenon frequently observed during pulsed field gel electrophoresis (PFGE), when instead of discrete bands a smear pattern results. The current version of dndDB includes a description of the Dnd phenotype in 24 bacterial species based on information extracted from PubMed references. The phylogenetic diversity and wide prokaryotic representation of these Dnd phenotypepositive organisms and others that we have shown to harbour dnd gene clusters is shown in Figure 1. These data are tabulated and easily retrieved using the 'Search' tool in dndDB. In addition, users can download an optimized Dnd phenotype verification protocol which utilizes activated tris-acetate-EDTA (TAE) buffer during agarose gel electrophoresis to check the Dnd phenotype of bacterial strains of interest. A simple PCR-based protocol to identify potential dndC gene homologues in bacterial isolates developed using dndDB is also provided. This method is also intended to serve as a template for other dndDB-facilitated PCRbased screening assays.

Horizontal gene transfer
Comparative analysis of dnd genes at a variety of granularities, such as the single gene, gene cluster, genomic island or genome-  scale level, will greatly aid investigations into the evolution of dnd gene clusters and the mechanisms that brought about their widespread dissemination across diverse and distant bacterial species. In dndDB, a powerful multiple sequence alignment algorithm, Muscle v3.7 [7], and a Java alignment editor, JalView 2.4 [8], were integrated to facilitate the comparison of the dnd gene clusters from 24 taxonomically distinct bacterial species and one archael member from various geographic niches. In addition, the popular GBrowse viewer [9] that combines a database and interactive web page was employed for manipulating and displaying annotations on dnd-bearing genomes. Remarkably, all identified dnd clusters lay within larger mobile genetic elements, 23 within chromosomal islands, 2 in the islands in the plasmidderived chromosome II of Pseudoalteromonas haloplanktis TAC125 and Vibrio fischeri MJ11, and one on the large Plasmid 3 of Mesorhizobium sp. BNC1 (see Table 1 for details). Analysis of these putative dnd-encoding islands demonstrated common key features typical of GIs: organism-atypical G+C contents, integration into tRNA genes, and/or possession of terminal direct repeats, integrase-and/or transposase-encoding sequences. Phylogenetic analysis of the dnd genes in the 26 identified dnd clusters confirmed the diverse nature of these sequences. Furthermore, significant discordances between the 16S rDNA-and dnd-derived phylogenetic trees, marked differences in the gene content within the remainder of the dnd islands, and the frequent absence of dnd islands in members of the same species, strongly supported the notion that the diverse dnd clusters and their cognate islands had been acquired independently on many occasions, rather than arising from a single or limited number of vertical evolutionary events. However, to date none of the defined dnd islands have been shown to be functionally mobile, though at least one, the S. lividans SLG island, is known to function as a typical, self-circularizing, site-specific integrative element [5]. We have also incorporated the SynView tool [10] into dndDB to facilitate larger scale synteny mapping so as to permit ready recognition of dnd island-borne orthologous genes. Figure 2 illustrates an example based on comparison of dnd islands from Escherichia coli, Salmonella enterica and Enterobacter sp. Such analyses will aid the identification of evolutionary links between members of this growing family of islands.

Dnd proteins and conserved domains
Amino acid sequences of Dnd proteins from the diverse dndbearing hosts were multiply aligned with Muscle [7], visualised and edited with JalView [8]. The neighbor-joining phylogenetic tree of matching 16S rRNA sequences was constructed by using Muscle and JalView. A phylogenetic tree based on NCBI taxonomy IDs of host organisms was also generated by using iTOL [11]. dndDB also contains a list of conserved domains and consensus sequences identified in Dnd proteins that have been previously deposited in the protein family database Pfam, the Conserved Domain Database (CCD), and/or the biological macromolecule 3-D structures database PDB [12]. In addition, hundreds of other proteins exhibiting lower levels of similarity to Dnd proteins with Blastp E-values of less than E 24 were extracted from the NCBI nr database and stored in dndDB to allow for rapid identification of more distantly related potential homologues or proteins performing related functions.
We have used dndDB and associated experimentation to analyse the DndA, DndB, DndC, DndD and DndE proteins of S. lividans and have used these data to predict their putative biological functions, thus shedding light on the novel DNA phosphorothioation biochemical pathway. The DndA protein is a likely cysteine desulfur-transferase that is proposed to provide sulphur via its Lcysteine desulfurylase activity (see Figure 3 for an outline of relevant data) [13]. DndB is a predicted Fe-S cluster binding protein, which we hypothesize affects modification specificity through its action as a transcriptional regulator. Similarly, DndC is proposed to contain a [4Fe-4S] cluster and has predicted ATP pyrophosphatase activity, features paralleling those of IscS and ThiI [14] which are involved in tRNA sulfur modification in Escherichia coli. DndD is a putative ATPase with DNA nicking activity which may couple ATP hydrolysis to DndE, a putative sulphur-transferase. However, much more detailed analyses and experimentation will be necessary to finalize the precise nature of the dnd biochemical pathway.

Search tools
The dndDB web server offers several search tools with varied options. Through the 'Search' page, users can retrieve Dnd phenotype, gene or protein homologues from dndDB by organism name. Via the 'Blast vs dndDB' page, users are able to blast a query sequence against dndDB to find homologous matches with WU-BLAST 2.0 [Gish, W., personal communication]. Finally, the 'tBlastn for Dnd' page, utilizes a NCBI tBlastn-based tool that we developed to predict potential Dnd proteins in user-supplied nucleotide sequences.
As future developments, we will shortly be uploading a large set of sequences which exhibit homology to isolated dnd genes, as apposed to dnd clusters only, and a further set corresponding to homologues of the full complement of non-dnd genes borne on dnd islands. We will continue to identify additional syntenic clusters, isolated dnd-like genes and other dnd island gene homologues as gene, genome and metagenome databases expand, and anticipate eventually providing a pipeline for ready automated discovery, annotation and analyses of dnd genes, clusters and associated genomic islands.
We envisage an evolving resource that seeks to effectively combine and interlink the genetics, biochemistry and functional aspects of dnd systems and their associated genomic islands. Such a unified resource will facilitate efficient investigation of a wide range of aspects relating to dnd DNA modification processes and other island-encoded functions in diverse host organisms. We also believe that the lessons learnt from ongoing dissection of the dnd system will provide clues to resolve mysteries relating to weakly similar genes, proteins and biochemical reactions, and in due course give rise to novel biotechnological and/or clinical applications; thus we expect that dndDB will prove to be of interest to a broad community of researchers.
The current version of dndDB includes the following information. (i) List of 24 bacterial species exhibiting the Dnd phenotype and associated publications; (ii) Details of dnd gene clusters from 25 species of Eubacteria and Archaea that were identified based on both sequence similarity and gene order (synteny) by employing Blastp searches against complete and partially sequenced genomes available at the NCBI server; (iii) Details of laterally acquired genomic islands harbouring dnd genes that were predicted using the GBrowse viewer (Generic Genome Browser) [9], Mobilome-FINDER server [17], Z Curve database online utility [18] and/or interactive Artemis Comparison Tool (WebACT) [19]. (iv) Archive of Dnd proteins and other potentially related proteins showing BLAST-based similarity, and corresponding conserved domains identified in the protein family database Pfam [20] and the Conserved Domain Database (CCD) [21].
dndDB currently contains details of over 114 dnd genes and their cognate proteins from the Eubacterial and Archaeal kingdoms, and is expected to grow quickly with the rapid development of genome sequencing projects and the ongoing refinement of strategies to identify distantly related gene clusters, orphan dnd genes, and functionally or biochemically related proteins. As more information about the Dnd system becomes available, the database will be expanded and improved accordingly.
In addition, brief descriptions of ongoing research into the dnd system by our group and collaborators are also incorporated into dndDB to foster dialogue and participation by the wider research community. These include work on a putative Dnd-dependent restriction-modification system, the precise nature of the DNA modification itself, the core sequence motif that targets the sitespecific modification in S. lividans [22], and the increasingly well characterized novel biochemical pathway that mediates this unique biological process. Future contributions from other researchers will be sought via dndDB. Sample experimental data demonstrating that DndA provides sulphur via its L-cysteine desulfurylase activity [13]. (F) Inferred biochemical reaction, in which DndA is predicted to catalyze the assembly of DndC as an iron-sulfur cluster protein [13]. doi:10.1371/journal.pone.0005132.g003