Comparative genomic analysis provides insight into the phylogeny and virulence of atypical enteropathogenic Escherichia coli strains from Brazil

Background Atypical enteropathogenic Escherichia coli (aEPEC) are one of the most frequent intestinal E. coli pathotypes isolated from diarrheal patients in Brazil. Isolates of aEPEC contain the locus of enterocyte effacement, but lack the genes of the bundle-forming pilus of typical EPEC, and the Shiga toxin of enterohemorrhagic E. coli (EHEC). The objective of this study was to evaluate the phylogeny and the gene content of Brazilian aEPEC genomes compared to a global aEPEC collection. Methodology Single nucleotide polymorphism (SNP)-based phylogenomic analysis was used to compare 106 sequenced Brazilian aEPEC with 221 aEPEC obtained from other geographic origins. Additionally, Large-Scale BLAST Score Ratio was used to determine the shared versus unique gene content of the aEPEC studied. Principal Findings Phylogenomic analysis demonstrated the 106 Brazilian aEPEC were present in phylogroups B1 (47.2%, 50/106), B2 (23.6%, 25/106), A (22.6%, 24/106), and E (6.6%, 7/106). Identification of EPEC and EHEC phylogenomic lineages demonstrated that 42.5% (45/106) of the Brazilian aEPEC were in four of the previously defined lineages: EPEC10 (17.9%, 19/106), EPEC9 (10.4%, 11/106), EHEC2 (7.5%, 8/106) and EPEC7 (6.6%, 7/106). Interestingly, an additional 28.3% (30/106) of the Brazilian aEPEC were identified in five novel lineages: EPEC11 (14.2%, 15/106), EPEC12 (4.7%, 5/106), EPEC13 (1.9%, 2/106), EPEC14 (5.7%, 6/106) and EPEC15 (1.9%, 2/106). We identified 246 genes that were more frequent among the aEPEC isolates from Brazil compared to the global aEPEC collection, including espG2, espT and espC (P<0.001). Moreover, the nleF gene was more frequently identified among Brazilian aEPEC isolates obtained from diarrheagenic patients when compared to healthy subjects (69.7% vs 41.2%, P<0.05). Conclusion The current study demonstrates significant genomic diversity among aEPEC from Brazil, with the identification of Brazilian aEPEC isolates to five novel EPEC lineages. The greater prevalence of some virulence genes among Brazilian aEPEC genomes could be important to the specific virulence strategies used by aEPEC in Brazil to cause diarrheal disease.

The main virulence feature of EPEC is its ability to produce the histopathological attaching and effacing (A/E) lesions on the surface of infected host cells [13]. The A/E lesion is characterized by the destruction of the microvilli brush border, intimate bacteria-cell adherence and formation of pedestal-like structures rich in F-actin and other cytoskeleton elements, which causes intestinal epithelial cells to have a reduced absorptive capacity [13,14]. The 41 genes associated with this phenotype are located in a pathogenicity island (PAI) termed the locus of enterocyte effacement (LEE) [15]. The LEE region encodes all the structural components of the type 3 secretion system (T3SS), as well as seven effector proteins, the intimin outer membrane protein, the intimin translocated receptor protein (Tir), chaperones, and transcriptional regulators [15][16][17][18]. Although enterohemorrhagic E. coli (EHEC) similarly contain the LEE and produce the A/E lesion on infected epithelial cells, EHEC also contain the stx genes, encoding the Shiga toxin, which is not present in EPEC [13]. Besides the T3SS effectors located in the LEE region, effectors encoded outside of LEE, known as non-LEE encoded effectors, have emerged as important accessory virulence factors in A/E pathogens [19,20]. These effectors are responsible for inducing several modifications in the host cells during infection, including inhibition of apoptosis, interference with inflammatory signaling pathways, invasion, tight junction disruption and phagocytosis [19,20].
Previous comparative genomic studies have demonstrated that EPEC exhibit considerable genomic diversity, with representatives in at least 10 different phylogenomic lineages [21,22]. This is most likely due to the acquisition of the LEE region and its stable retention by genomically diverse lineages of E. coli [23]. These prior studies have focused on investigating the genomic diversity of EPEC isolates from locations in Europe, Africa, and Asia [22,23]. While aEPEC poses a significant diarrheal burden in South America [2,9,24], to our knowledge there have been no previous comparative genomics studies that have investigated the genetic diversity of aEPEC from Brazil or other countries in South America. Thus, the goal of this study was to analyze the genetic relatedness and virulence gene content of Brazilian aEPEC isolates compared to a globally distributed collection of aEPEC.

DNA extraction, genome sequencing, and assembly
Genomic DNA of the 106 aEPEC isolates from Brazil (S1 Table) was extracted from overnight cultures using the Sigma GenElute bacterial genomic DNA kit (Sigma-Aldrich; St. Louis, MO). The genomes were sequenced using paired-end libraries on the Illumina HiSeq 4000. The 150 bp Illumina reads were assembled using SPAdes v.3.7.1 [25], and the final assemblies were filtered to contain contigs that were �500 bp in length and had �5X kmer coverage.

In silico Multilocus Sequence Typing (MLST) and serotyping
The seven loci of the multilocus sequence typing (MLST) scheme (adk, gyrB, fumC, icd, mdh, purA, and recA) developed by Wirth et al. [28] were identified in each of the genomes listed in S1 and S2 Tables using BLASTN, and MLST sequence type (ST) were determined by querying the PubMLST database (https://pubmlst.org).

Large-Scale BLAST Score Ratio (LS-BSR) analysis
The 106 Brazilian aEPEC and additional A/E E. coli genomes (S1 to S3 Tables) analyzed in this study were compared by BLASTN LS-BSR analysis as previously described [30]. Briefly, the predicted protein-encoding genes of each genome that had �90% nucleotide sequence identity and �90% alignment length were assigned to gene clusters using CD-HIT v.4.6.4 [31] (S1 Data Set). The predicted protein function of each gene cluster was determined with an ergatisbased [32] in-house annotation pipeline [33]. The presence of gene clusters with significant similarity in each of the five novel phylogenomic lineages identified in this study (EPEC11, EPEC12, EPEC13, EPEC14 and EPEC15) were defined by using an LS-BSR value �0.9; while the absence in the remaining A/E genomes were defined by using an LS-BSR value �0.4.
Additionally, we examined the occurrence of genes that were more common among the Brazilian aEPEC isolates when compared to the isolates from outside of Brazil, as well as among Brazilian and global aEPEC identified in the same phylogenomic lineage. The presence of each gene cluster was defined as having a BLASTN LS-BSR value �0.8, and differences in genes frequency among the two group of aEPEC studied (Brazilian versus Global) were tested by using the Chi-square test with Yates correlation and two-tailed Fisher's exact test.

Characteristics of the Brazilian aEPEC genomes
A total of 106 aEPEC isolates obtained from previous studies performed in Brazil [5,8,9,45] were sequenced. This collection of Brazilian aEPEC were isolated over a span of seven years from 2009 to 2016, and were associated with multiple clinical presentations, including 17 isolates from subjects with asymptomatic infection, 76 from patients with diarrhea, and 13 from diarrheal patients investigated in five distinct diarrheal outbreaks (S1 Table). The average genome size of the Brazilian aEPEC genome assemblies was 4,927,030 bp (range 4,551,111 to 5,442,868), and the average GC content was 50.53% (range: 50.13% to 50.78%), which is consistent with previous studies of sequenced E. coli genomes [22,46] (S1 Table).

Identification of genes that are unique among the novel EPEC phylogenomic lineages
We used LS-BSR to identify gene content that is present in one group of isolates and lacking in another. First, we examined the existence of genes that were present in each of the five novel EPEC lineages (EPEC11, EPEC12, EPEC13, EPEC14 and EPEC15) and absent in all other A/E E. coli genomes included in this study. A total of 936 gene clusters were found exclusively, at a  (Table 4). A summary of the LS-BSR analyses can be found in Table 4, and all genes exclusively present in each of the five novel phylogenomic lineages are listed in S4 Table.

Differential prevalence of gene content between Brazilian aEPEC compared to a global collection of aEPEC
Since so few genes were exclusively present or absent in the novel phylogenomic lineages we wanted to examine if there were any genes that were more statistically prevalent among the Brazilian aEPEC isolates when compared to the isolates obtained from locations outside of Brazil. This analysis of the gene content led to the identification of 246 genes that were statistically more frequent (P<0.001) among the Brazilian aEPEC genomes than aEPEC from other geographic origins (Brazilian aEPEC vs Global aEPEC). Included among these genes were two T3SS-effectors: espG2, gene cluster 128_11_7_153 (20.8% vs 6.8%), and espT, gene clusters 148_12_48_29 (21.7% vs 3.6%) and CYEQ01000057.1_8 (17.9% vs 3.2%). The espC gene, encodes an autotransporter protein [50], represented by gene clusters 1024_15_1_31 (27.4% vs 7.2%) and 276_12_73_9 (26.4% vs 8.1%) was also significantly more prevalent among the Brazilian aEPEC than the global aEPEC (Table 5). Conversely, 446 genes were statistically less frequent (P<0.001) among the Brazilian aEPEC when compared to the global aEPEC from outside Brazil, including genes encoding a type VI secretion ATPase (49.1% vs 69.7%), the T3SS-effectors nleA (36.8% vs 59.7%), ospB (24.5% vs 44.3%) and espW (7.5% vs 25.3%), and the IrgA homolog adhesin iha (6.6% vs 22.6%) ( Table 5). Curiously, genes encoding proteins that mediate antimicrobial resistance were also statistically less frequent (P<0.001) among the Brazilian aEPEC genomes compared to the aEPEC genomes from outside Brazil, as follows: bla TEM-1 (17.0% vs 38.9%) encoding a TEM-1 beta-lactamase, sul2 (15.1% vs 40.7%) encoding a sulfonamide resistant dihydropteroate synthase, aph(6)-Id (11.3% vs 38.9%) encoding an aminoglycoside phosphotransferase, and tetA (7.5% vs 24.0%) encoding an antibiotic efflux pump (Table 5). Other genes that were differentially detected among Brazilian and global aEPEC have predicted functions associated with central metabolism and transcriptional regulation, as well as numerous hypothetical proteins (S5 Table).

NleF-encoding gene is associated with aEPEC diarrheal disease in Brazil
Since the Brazilian aEPEC isolates studied here were obtained from patients with diarrhea (89 isolates) as well from healthy subjects (17 isolates), we decided to compare if any of the 43 genes encoding virulence factors investigated was associated with their ability to cause illness in the human host. The number of virulence factor-encoding genes detected in aEPEC isolates obtained from patients with diarrhea ranged from 3 to 23, whereas in those from healthy subjects ranged from 5 to 18 (Fig 2). Of importance, the nleF gene was significantly more prevalent among aEPEC isolates obtained from diarrheagenic patients than from healthy subjects (69.7% vs 41.2%, P<0.05) (S7 Table).

Discussion
As prior studies have focused on investigating the genomic diversity of EPEC isolates from locations in Europe, Africa, and Asia [22,23], the genomic diversity of aEPEC from Brazil or other countries in South America remains largely unexplored. Thus, in this study we describe the genomic diversity of a collection of 106 human aEPEC isolates from Brazil. We also considered the genomic diversity of the Brazilian aEPEC compared to a global collection of aEPEC isolates to gain insight into how Brazilian aEPEC may differ in their evolutionary relatedness and virulence gene content compared with aEPEC from outside of Brazil. By examining the prevalence of known EPEC virulence factors, we determined that the T3SS non-LEE encoded effectors espG2 [55] and espT [37,56], and the autotransporter gene espC [50] were statistically more prevalent among aEPEC from Brazil compared to the global aEPEC from outside Brazil. Of particular interest is EspT, which was functionally characterized in aEPEC archetype isolate E110019 from one of the more several diarrheal outbreaks associated with aEPEC as there were 650 individuals in a school and 137 associated household members in Finland that developed diarrhea [49]. EspT is responsible for triggering the formation of lamellipodia and membrane ruffles, through the activation of Rac-1 and Cdc42 [37], thus facilitating the bacterial invasion into non-phagocytic cells [56]. Although the occurrence of invasive aEPEC in the Brazilian settings have already been reported [57], no correlation between this phenotype and the presence of the espT gene has been observed to date. EspG2 is also of importance as it is involved in microtubule disruption [55]. EspC, a member of the autotransporter family, is involved in EPEC-mediated cell death and induces both apoptosis and necrosis in epithelial cells [58]. In tEPEC archetype isolate E2348/69 the espC and espG2 (orf3) are located in the same PAI, termed integrative element (IE) 5 or EspC PAI [35,59], which may explain the simultaneous dissemination of these two genes among the Brazilian aEPEC isolates. In addition, we observed that nleF was the only virulence factor-encoding gene that demonstrated a statistical association with the isolates from diarrheal disease cases when compared to Brazilian aEPEC isolates from asymptomatic individuals. NleF is a Non-LEE effector, secreted to the host cells by the T3SS, that binds to caspase-4, -8, and -9 inhibiting the catalytic activity of these caspases in vitro, and therefore, prevents apoptosis in HeLa and Caco-2 cells [60]. We also observed that the Brazilian aEPEC differ in their gene content at the phylogenomic lineage level, when compared with aEPEC from other geographic origins. In particular, the toxB gene of pO157 from O157:H7 EHEC strains, and efa1/lifA of the PAI OI-122 [54,61], were more prevalent among Brazilian aEPEC identified in the EHEC2 and EPEC10 phylogenomic lineages, respectively. The proteins encoded by the toxB and efa1/lifA genes are associated with the ability of EHEC strains to adhere to epithelial cells. Of note, the EHEC adherence factors efa1 (Z4332) and efa2 (Z4333) genes, located in the third module of the PAI OI-122 from EHEC O157:H7 strains [52], can encode proteins with amino acid sequences that are identical to the amino acids 1 to 433 (99.0% of identity) and 437 to 711 (100.0% identity) of the Efa1 adhesin identified in the EHEC O111:H - [62]. A previous study, using random transposon mutagenesis, demonstrated that the EHEC O157:H7 Sakai strain, harboring a transposon insertion in the efa1 gene, was significantly less adherent than the wild type strain in assays performed with Caco-2 cells [63]. The nucleotide sequence of the efa1 gene, from the EHEC In silico detection of known EPEC virulence factor-encoding genes in the 106 Brazilian aEPEC genomes studied. The virulence factor-encoding genes were detected in each of the 106 Brazilian aEPEC genomes analyzed in this study using large-scale BLAST score ratio (LS-BSR) analysis. Each row is a different aEPEC genome, grouped by phylogenomic lineage, and each column represents a single gene (listed in the S7 Table). The presence or absence of all predicted protein-encoding genes in the 106 Brazilian aEPEC genomes is indicated by the BSR values represented in the heatmap generated by using the heatmap2 function of gplots v.3.0.1 in R v.3.3.2. Colors of the heat map indicate virulence genes that were detected with significant similarity (blue), with divergent similarity (gray) or were absent (white) in each of the Brazilian aEPEC genomes analyzed. On the far right red circles indicate aEPEC isolates obtained from patients with diarrhea, while green circles indicate aEPEC isolates obtained from healthy subjects.
https://doi.org/10.1371/journal.pntd.0008373.g002 O111:Hstrain, is 99.0% identical to the lifA gene (also termed efa1/lifA) from the tEPEC E2348/69, that has been demonstrated to inhibit lymphocyte proliferation and lymphokine production [64]. A clinical study, that evaluated stool samples from three distinct children with diarrhea associated with an EHEC isolate with the serotype O157:HNM, demonstrated that EHEC may lose the stx2 gene during trafficking in the host [65]. This line of evidence suggests that some of the aEPEC isolates in our study, which were genomically related to EHEC lineages and/or harbored virulence genes associated with this pathotype may in fact be EHEC that have lost the Shiga toxin genes during trafficking in the host or environment, or during culture in the clinical laboratory.
Overall, this study demonstrated the aEPEC from Brazil are genomically diverse, with representatives in nearly all of the E. coli phylogroups and previously described EHEC and EPEC phylogenomic lineages, which is similar to what has been previously described for aEPEC from outside of Brazil. The Brazilian aEPEC genomes described in this study not only provide insight into the genomic diversity of aEPEC in Brazil, but also allowed the description of five novel EPEC phylogenomic lineages. We also identified important differences in gene content, including known virulence associated genes, between Brazilian aEPEC compared to a global collection of aEPEC from outside Brazil. Further investigation of the genomic diversity and prevalence of select virulence genes among additional aEPEC from Brazil as well as other locations in South America may provide insight into the greater prevalence of some of these lineages of aEPEC in association with diarrheal illness in Brazil. Atypical EPEC from the global collection was obtained from all five continents, mainly from Africa, Asia, and Europe. In parentheses is the number of aEPEC isolates obtained from each country. This world map was constructed using the free-software Q-GIS. (TIFF) S1