Comparative Screening of Digestion Tract Toxic Genes in Proteus mirabilis

Proteus mirabilis is a common urinary tract pathogen, and may induce various inflammation symptoms. Its notorious ability to resist multiple antibiotics and to form urinary tract stones makes its treatment a long and painful process, which is further challenged by the frequent horizontal gene transferring events in P. mirabilis genomes. Three strains of P. mirabilis C02011/C04010/C04013 were isolated from a local outbreak of a food poisoning event in Shenzhen, China. Our hypothesis is that new genes may have been acquired horizontally to exert the digestion tract infection and toxicity. The functional characterization of these three genomes shows that each of them independently acquired dozens of virulent genes horizontally from the other microbial genomes. The representative strain C02011 induces the symptoms of both vomit and diarrhea, and has recently acquired a complete type IV secretion system and digestion tract toxic genes from the other bacteria.


Introduction
The gram-negative anaerobic bacterium Proteus mirabilis is one of the major Proteus infectious factors, and may cause severe pain by forming stones in the urinary tract [1]. P. mirabilis may induce various pathogenic symptoms, including fever, chills and chest pain, etc [2]. Its motility and adhesion to solid surface makes P. mirabilis extremely easy to propagate through medical devices in hospitals [2]. P. mirabilis is also notorious for its ability to actively acquire antibiotic resistance and infectious toxin genes from other microbial genomes through horizontal gene transferring [3,4].
Quite a number of molecular typing technologies have been developed for a prompt screening of Proteus mirabilis in clinical samples. The protein UreC is the alpha subunit of the urea degradation pathway encoded by P. mirabilis [5]. The second metabolite repressor gene rsmA is involved in the swarming motility and the expression regulation of virulence factors in P. mirabilis [6]. P. mirabilis also encodes a kidney damage inducer, the hemolysin HpmA, which shows significant cytotoxicity [7]. The virulence factor metalloprotease ZapA is demonstrated to regulate the IgA hydrolysis process by the inhibiting profiles [8]. A number of social network analyzing techniques were applied to characterize the disease-associated microRNAs [9,10], and the improved detection performance suggests that techniques from different areas may be complementary to each other for the challenging detection problem for the disease-microRNA association [11,12], which may benefit the detection problem of pathogenicity genes in P. mirabilis.
This study hypothesized that P. mirabilis exerts its infectious and toxic functions in the new location, digestion tract, through site specific genes. Three P. mirabilis strains isolated from a local outbreak of food poison, and two more non-toxic strains were isolated from the same physical location after the outbreak event. The genes and repetitive elements were annotated for all the P. mirabilis strains. A comprehensive screening of the digestion tract toxicity (DTT) specific genes is carried out for the P. mirabilis infectious strains. The comparative study supports our hypothesis that some strain specific genes may serve as candidates for rapid strain typing and treatment selection.

Strain isolations
During an outbreak of a food-poisoning event in Shenzhen in June of 2002, three strains of P. mirabilis were collected from samples of different patients. The strain C02011 was isolated from the vomit sample of a patient, and it has the same Pulsed-Field Gel Electrophoresis (PFGE) band of the P. mirabilis strain isolated from the diarrhea stool. So the strain C02011 is kept as a representative strain for further analysis. The strain C04010 was isolated from the diarrhea stool sample of a patient in the same event. Another strain C04013 was isolated from the food taken by the patients immediately before this food-poisoning event.
These three strains of P. mirabilis were compared with two locally isolated strains and two reference strains. Another two strains of P. mirabilis C02034 and B02005 were isolated from the food and food worker's stool during non-outbreak time at the same physical location. The two reference strains HI4320 [2] and BB2000 [13] were urinary tract infection strains, and were also chosen as the control samples for the comparative genomics investigation.
This study was approved by the Ethics Committee of Shenzhen Center for Disease Control and Prevention on June 9, 2002. The authors did not collect the samples directly from the patients, and received the samples from the local hospitals. This study investigated the genome sequencing data of the pathogens and did not access the patients' personal data. So no informed consent forms were collected from the patients.

Genomic sequencing and gene annotation
The five newly isolated strains were sequenced and assembled by the BGI-Shenzhen. Two insert fragment lengths 500 and 2000 bps were chosen for the paired-end (PE) sequencing using the Illumina HiSeq-2000 machine. The read lengths were set as PE-90 and PE-49 for the 500-bp and 2000-bp libraries, respectively. The 500-bp paired-end reads were de novo assembled using the program SOAPdenovo version 1.06, and the contigs were further connected into scaffolds using both 500-bp and 2000-bp PE reads [14]. The genes were annotated using the program Glimmer version 3.02 [15]. The functions of each gene were annotated using the NCBI BLAST based on the databases NCBI NR [16], SwissProt/Trembl [17], COG [18] and KEGG [19], respectively. The data are summarized in the Results and discussion section. Transcriptome is not investigated in this study and the microRNA precursor genes will be characterized by both in silico techniques [20] and RNA-seq experiments [21].

Annotation of repetitive elements
Repetitive elements are not annotated by default for a genome sequencing service, and a comprehensive annotation is conducted for the five sequenced strains and the two reference strains of P. mirabilis investigated in this study.
Repetitive elements were annotated by the RepeatMasker version 4.05 [22]. The tandem repeats were screened by the de novo program TRF version 4.04 [23]. Prokaryotic genomes have transposable elements like Insertion Sequence (IS) elements [24,25] and Miniature Inverted-repeat Transposable Elements (MITEs) [26][27][28]. Full copies of IS elements were detected and curated by multiple sequence alignments of IS encoded transposase genes, using the program MEGA version 6.06 [29]. Full copies of detected IS elements and all other fulllength IS copies from the database ISfinder [30] were used as templates to screen for all the IS copies in the seven strains of P. mirabilis. No MITEs were detected by the program MUST version 1.0 [31].
CRISPRs are a series of consecutive short repeats with spacers in between, and the spacer sequences are usually acquired from the invasive foreign elements [32,33]. The program CRISPRfinder [34] was used to detect CRISPRs in the seven investigated P. mirabilis genomes.

Generation of the phylogenetic tree
A phylogenetic tree based on the 16S ribosomal RNA (rRNA) genes is generated for the five P. mirabilis strains sequenced by this study and the three additional P. mirabilis reference strains published elsewhere. The 16S rRNA genes of the five newly sequenced P. mirabilis strains are annotated as described in the above sections. The 16S rRNA genes of the two P. mirabilis strains HI4320 and BB2000 are collected from the NCBI database. C05028 is another publicly available strain with DTT function [35], and its 16S rRNA gene is extracted using NCBI Mega-Blast version 2.2.32+ [36] with the 16S rRNA of P. mirabilis HI4320 as template. The phylogenetic tree is rooted at Proteus penneri ATCC 35198, whose 16S rRNA gene is detected with the template gene from the strain HI4320 using the program NCBI BLASTN version 2.2.32+ [36]. The E-value 0.0 supports that the two microbial genomes are close relatives with enough statistical significance. The phylogenetic tree is generated using Phylogeny.fr [37]. All the programs are executed using the default parameters.

Screening for the strain specific genes
A highly sensitive screening of whether a gene has a homolog in a given genome is conducted using the NCBI MegaBLAST version 2.2.32+ [36] with the default parameters. Only those genes with no homologs are kept as strain specific candidates.

Results and Discussion
In vitro PFGE strain typing Three isolated P. mirabilis strains demonstrate an abnormal infectious ability and toxicity to the digestion tract, but two other strains don't have such ability. So the first question is whether the existing molecular typing technologies may differentiate these strains by their pathogenesis.
PFGE is a widely used rapid subtyping technique for the macrorestriction analysis of pathogens, as recommended by the PulseNet laboratories [38]. 50 U of the restriction enzyme SfiI in the 200 ml buffer was used to generate the genomic fragment profile, and a standard protocol was followed [39]. After the gel was scanned as the TIFF format image, a dendrogram tree was calculated using the software BioNumerics version 5.0 (Applied Maths BVBA, Belgium), as similar in [40].
The restriction enzyme SfiI cuts the genomic sequence into fragments of lengths between 20-700 kbps, and the PFGE profiles in Fig 1 shows significant differences even between closely related P. mirabilis strains. The inconsistency of PFGE profiles of these six P. mirabilis strains with their pathogenicity suggests that these genomes may under strong selection pressures for genomic changes, and may undergo both intra-genome structural variations and intergenome/strain horizontal transferring.

In vitro PCR strain typing
The polymerase chain reaction (PCR) is also a common technique to subtype pathogens based on the existences of known pathogenic factors [41]. The PCR-based screening of the genes ureC, rsmA, hpmA and zapA were conducted in the five newly isolated strains of P. mirabilis. As shown in Table 1, the primers for these four genes were designed in-house, and a standard PCR protocol was followed.
Unfortunately, there is no difference between the digestion tract toxic strains and the other strains, since all the five strains encode all these four genes, as shown in Table 2    the existing pathogen subtyping techniques are not precise enough to differentiate these five P. mirabilis strains. This study carried out a comprehensive screening of functional elements that are specific to these three digestion tract toxic strains, compared against the other two digestion tract infectious strains. These strain specific genes may serve as candidate subtyping targets.
Summary of the seven P. mirabilis strains P. mirabilis strain C02011 induces the symptoms of both vomit and diarrhea, and is chosen as the reference genome for the screening of DTT genes. Among the seven P. mirabilis strains, C02034 and HI4320 have the genome sizes larger than 4 Mbps, and all the other strains have a genome size between 3.80 and 3.85 Mbps, as shown in Table 3. It's interesting to observe that these two largest genomes also have the largest numbers of IS elements, with 28 and 24 in C02034 and HI4320, respectively. This supports the observation that IS number tends to be positively correlated with the genome size [25,42]. All the seven strains have a similar GC content between 38.38-38.88%. The Pearson's test generates the P-value = 7.341e-4 for the correlation between the Gsize and Gene# for the seven strains of P. mirabilis. So a large genome tends to encode more genes. Only the two strains C02011 and C02034 have 2 CRISPRs, respectively.
All the seven P. mirabilis strains encode drug resistance genes, but only the three strains with DTT function encode toxin genes, as shown in Table 3. The type IV secretion system Columns "Gsize (bps)" and "GC" give the estimated genome size and the G+C content of each strain. The numbers of annotated genes and IS elements are given in the columns "Gene#" and "IS". The annotated CRISPR number is given in the column "CRISPR". The numbers of drug resistant genes and toxin genes are based on the gene function annotation and listed in the columns "rDrug" and "Toxin", respectively. The gene number in the T4SS is in the column "T4SS", which is summarized from the gene annotations. The column "DTT specific" gives the number of genes with no homologs in the non-toxin P. mirabilis strains. "-"means that this item is not analyzed for this strain. The gene annotation of the strain C05028 is not publicly available, and is not included in this (T4SS) does not seem to be a major secretion system in P. mirabilis, with 3 T4SS genes in each strain except for C02011. And each of the three digestion tract toxic strains encodes~60 genes, which are not detected in the other P. mirabilis strains. This study will focus on the DTT specific genes that may facilitate the radical pathogenesis during the local food poison event.

Phylogenetic traces
P. mirabilis is notorious for its infection and stone forming in the urinary tract, and usually causes severe pain. The two publicly available strains BB2000 and HI4320 may induce various whole body symptoms when infecting the urinary tract [2,13]. But the five strains isolated from Shenzhen and the strain C05028 seem to have gained the ability to infect the digestion tract, and four of them demonstrate DTT function during their infections.
The five locally isolated strains of P. mirabilis together with the strain C05028 originate from the same common ancestor, whereas the other two urinary tract infectious strains BB2000 and HI4320 originate from another common ancestor, based on the phylogenetic tree in Fig 2. The two non-toxic strains C02034 and B02005 do not show a clear separation from the three digestion tract toxic (DTT) strains on the phylogenetic tree. This suggests that genes specific to DTT strains may play an essential role in infecting and inducing toxicity to the digestion tract.

DTT specific elements
Most of the DTT specific genes are also strain specific, as shown in Fig 3. The DTT specific genes have no homologs in the two non-toxic strains C02034 and B02005 and the two urinary tract infectious strains HI4320 and BB2000, and over 70% of them are only detected in one of the three DTT strains. And all 45 genes specific in the strain C02011 does not have detectable homologs in the publicly available strain P. mirabilis C05028. By considering the phylogenetic relationships among the three DTT strains (as in Fig 2), the current data strongly suggests that all the three P. mirabilis DTT strains are actively and independently acquiring foreign genes from the environmental neighboring microbes.

Core module for DTT function
The most infectious and toxic strain C02011 encodes 45 genes with no homologs in all the seven other strains. This suggests that these genes may have been acquired through the mechanism of horizontal gene transferring. The function annotation shows that these 45 genes do not introduce new antibiotic resisting abilities into the strain C02011, as shown in Table 4 and S1 Table. Three large genomic islands with at least 5 consecutively distributed genes are detected in the strain C02011. The largest genomic island encodes 15 genes, including a complete T4SS. The transmembrane secretion assembly T4SS consists of multiple component proteins, and is utilized by various gram-negative bacteria to transport virulent factors between the pathogen and the host cells [43]. Among the sequenced P. mirabilis strains, T4SS is only observed in the urinary tract infecting strain HI4320 [2,44,45]. And the T4SS in the strain C02011 does not show any similarity on the primary sequence level to that in HI4320, which supports the observation of little sequence conservation of T4SS modules from different bacteria [43,[46][47][48][49].  The three large genomic islands are highlighted by a bold font in the column "gene_id". The gene annotations are summarized from the S1 This genomic island also encodes three genes with significant homologs in the plasmid pMET-1 from the Gammaproteobacterium Klebsiella pneumoniae [50]. Among the three genes, mobB encodes three transmembrane domains and ORF15 has a 5' signal peptide. Another two genomic islands encode genes with various phage functionalities, as shown in the S1 Table. Besides these transmembrane transporting genes, the strain C02011 also encodes a gene C02011GL002300 homologous to the phage terminase from a food poison pathogen Yersinia enterocolitica 8081, as shown in the S1 Table. Y. enterocolitica 8081 belongs to the serotype 0:8 and biotype 1B of Y. enterocolitica, and may induce severe diarrhea [51]. The most recent common ancestor between P. mirabilis and Y. enterocolitica on the phylogenetic tree is Enterobacteriaceae on the family level. The data suggests that P. mirabilis C02011 may have horizontally acquired severe digestion tract toxic genes from another genus Yersinia.  Table. Annotations of the 45 genes specific in P. mirabilis C02011. The three candidate genomic islands are highlighted by bold font in the column "gene_id". The annotation is provided by the BGI, using the reference databases NCBI NR, SwissProt, Trembl, COG and KEGG. (DOCX)