A New Variant of the Capsule 3 Cluster Occurs in Streptococcus pneumoniae from Deceased Wild Chimpanzees

The presence of new Streptococcus pneumoniae clones in dead wild chimpanzees from the Taï National Park, Côte d'Ivoire, with previous respiratory problems has been demonstrated recently by DNA sequence analysis from samples obtained from the deceased apes. In order to broadenour understanding on the relatedness of these pneumococcal clones to those from humans, the gene locus responsible for biosynthesis of the capsule polysaccharide (CPS) has now been characterized. DNA sequence analysis of PCR fragments identified a cluster named cps3Taï containing the four genes typical for serotype 3 CPS, but lacking a 5′-region of ≥2 kb which is degenerated in other cps3 loci and not required for type 3 biosynthesis. CPS3 is composed of a simple disaccharide repeat unit comprising glucose and glucuronic acid (GlcUA). The two genes ugd responsible for GlcUA synthesis and wchE encoding the type 3 synthase are essential for CPS3 biosynthesis, whereas both, galU and the 3′-truncated gene pgm are not required due to the presence of homologues elsewhere in the genome. The DNA sequence of cps3Taï diverged considerably from those of other cps3 loci. Also, the gene pgm Taï represents a full length version with a nonsense mutation at codon 179. The two genes ugd Taï and wchE Taï including the promoter region were transformed into a nonencapsulated laboratory strain S. pneumoniae R6. Transformants which expressed type 3 capsule polysaccharide were readily obtained, documenting that the gene products are functional. In summary, the data indicate that cps3Taï evolved independent from other cps3 loci, suggesting the presence of specialized serotype 3 S. pneumoniae clones endemic to the Taï National Park area.


Introduction
Streptococcus pneumoniae is one of the major bacterial human pathogens. Its polysaccharide capsule is an essential virulence factor [1][2][3][4][5][6]. In fact, the capsule gene cluster appears to be among the few components of S. pneumoniae described as virulence factors that distinguishes the pathogen from its closest commensal relative S. mitis [7]. Up to now over 90 capsular serotypes have been described that can be distinguished immunologically by antisera specific for the capsule polysaccharide (CPS), biochemically and genetically [8][9][10][11]. All cps clusters are located at a specific region in the genome flanked by conserved sequences of the two genes dexA and aliA [10].
The capsular serotype is also an important epidemiological marker for S. pneumoniae. Clones of genetically closely related strains can be characterized by multi locus sequence typing (MLST), i.e. comparative sequence analysis of seven house keeping genes, and thus individual strains are characterized by their allelic profile which constitutes the sequence type (ST) [12]. Generally, isolates with the same ST share the same serotype, although serotype switch occurs occasionally due to horizontal gene transfer of capsular genes [13][14][15][16].
S. pneumoniae is considered to be a human specific pathogen. Nevertheless pneumococci have been isolated from a variety of animals held in captivity (pets, zoo or laboratory animals), either as carriage isolates or causing a variety of disease symptoms [17][18][19][20][21][22][23]. There is only one case where S. pneumoniae were demonstrated in wild animals [24]. DNA sequencing using samples obtained from deceased wild chimpanzees from the Taï National Park revealed genes encoding typical S. pneumoniae proteins such as the major autolysin LytA, pneumolysin Ply, and the penicillin binding protein 26 (PBP26). Moreover, MLST analysis identified two new clones that have not been found within the human population including workers on the Taï chimpanzee project. The closest human isolates differed in four out of seven alleles, and it has been suggested that S. pneumoniae virulent to great apes occur endemically in this area [24].
Since live bacteria could not be isolated from the wild chimpanzees, we have used DNA samples from three apes covering both STs to investigate the capsular type of the S. pneumoniae clones. Recently, a multiplex PCR scheme has been developed to differentiate 29 serotypes most common in the US [25]. In the present study a modulated system was used which covers the serotype distribution in Africa (http://www.cdc.gov/ ncidod/biotech/strep/pcr.htm). The results document the presence of genes involved in CPS of type 3 in all samples. Comparison with known sequences of the cps3 locus from human isolates revealed major differences. Transformation experiments were performed using the laboratory strain R6 as recipient to verify their function.

PCR amplification of the cps cluster from chimpanzee samples
In order to identify genes related to biosynthesis of the pneumococcal capsule, first a multiplex PCR was applied on DNA samples obtained from three chimpanzees (here referred to as 'Taï' samples) representing the three ape communities and the two S. pneumoniae clones identified by MLST analysis previously [24]. Each of the seven PCR reactions includes four to five primer pairs specific for distinct cps clusters. In addition, each reaction contains one primer pair which is specific for the gene cpsA (wzg) which is present in all cps clusters and thus serves as positive control ( [25]. Forty serotype specificities are covered by a modulated version to include clinical specimen from Africa (http://www.cdc.gov/ncidod/biotech/strep/pcr.htm). Each serotype gives rise to one DNA fragment in only one of the PCR reactions. The size of the PCR fragment specifies the cps clusters and the serotype has to be confirmed by DNA sequence analysis.
An appr. 0.4 kb DNA fragment was obtained with all Taï samples in one of the multiplex reactions (for example, see lane 4 in Fig. 1A). However, no product corresponding to the expected cpsA fragment was detected in any of the PCR reactions, suggesting some unusual composition of the cps Taï cluster. One PCR reaction resulted in several DNA fragments which did not correspond to any of the potential products, and these were not investigated further (lane 2 in Fig. 1A). DNA sequencing identified the same 371 nucleotide (nt) sequence in all three Taï samples corresponding to a galU fragment typical for the S. pneumoniae cps3 cluster (also named cps3U or cap3C [26,27]). In this context it should be pointed out that the nomenclature proposed by Bentley et al. for cps genes was used throughout the manuscript [10].
In order to understand why the control cpsA fragment was not obtained, and to gain more information about the genetic arrangement of the cps Taï cluster, a long-range PCR reaction was performed to obtain the DNA sequence of the entire cps3 Taï cluster. Primers specific for the genes dexB (spr0310) and aliA (spr0327) which are flanking all S. pneumoniae cps clusters were used. The PCR products from all three Taï samples were approximately 8 kb long (for example, see Fig. 1B). However, the cps3 region of strain S. pneumoniae SP3-BS71, a representative of a major type 3 clone of ST180 whose genome sequence is available, is predicted to be 12.8 kb [28], and of another type 3 S. pneumoniae 524/62 of unknown ST is 10.3 kb [10], a variation due to the presence of highly variable transposase fragments. The smaller size of the Taï PCR product suggests either a modified cps3 cluster with large deletions, or the presence of a novel capsular type in the Taï samples. DNA sequence analysis of all three 8 kb fragments clearly identified the four genes specifying the cps3 cluster, and all samples produced identical DNA sequences. However, the cps3 Taï region bears special features as outlined below.
DNA sequence analysis of the cps3 Taï cluster The cps3 cluster can be devided into three regions (Fig. 2) [10,26,29]. The first region contains sequences common to all serotypes (region I in Fig. 2), but is not required in cps3 since it is mutated and contains mainly pseudogenes of variable size. This entire region I is missing in cps3 Taï , which explains the smaller size of the PCR product and the failure to detect the control wzg fragment in the multiplex PCR.
Region II contains the two genes essential for biosynthesis of the type 3 capsule which is composed of cellobiuronic acid units connected in a b(1R3) linkage [30]: ugd encoding the UDPglucose dehydrogenase responsible for UDP-glucuronic acid (UDP-GlcUA) synthesis, and the type 3 synthase gene wchE encoding a processive b-glucosyltransferase linking the alternating glucose and GlcUA moieties (Fig. 3) [27,29,31]. WchE represents the simplest synthesis and export pathway for cps. In cps3 Taï , region II is intact.
Region III contains the two genes galU and pgm. GalU and Pgm are required for synthesis of UDP-Glc (Fig. 3), a precursor for all capsular types and other cell wall polymers as well. These two genes are non essential for CPS3 biosynthesis since homologues of both genes occur elsewhere in the pneumococcal genome, here referred to as galU 2 and pgm 2 , [27,32]. Also, pgm within the cps3 cluster is truncated, and the putative product is probably non functional due to the lack of a C-terminal domain important for phosphomutase activity [29,33].
Region III of cps3 Taï contains downstream of galU a pgm homologue which has some peculiar properties. The DNA sequence of pgm Taï reveals a full size gene (1740 nt) similar in length to pgm 2 (1719 nt) in contrast to e.g. pgm SP3-BS71 (1218 nt). A mutation within the ATG pgm start codon in combination with a single nucleotide deletion four nucleotides upstream results in an 8 amino acid (aa) extended N-terminal sequence of the putative pgm Taï gene product; these mutations also affect galU so that it lacks the last codon. Moreover, the pgm codon 179 is changed into a premature stop codon resulting in a pseudogene. The DNAsequence of the 59-region of pgm Taï is highly related to pgm SP3-BS71 (1.2% difference), but largely different to pgm 2 SP3-BS71 (26%). Interestingly, BLAST analysis revealed a homologue pgmA of S. dysgalactiae subsp. equisimilis of similar size (1719 nt) which differed in only 4.5% ( Fig. 4 and S1; Table 1). This strongly suggests that the Pgm gene which is located in the cps3 cluster is an orthologue of the chromosomal gene pgm2 of S. pneumoniae.
In addition to the SP3-BS71 cps3 sequence [28], there are another three sequences of S. pneumoniae cps3 clusters available: those from strains 524/62 [10], 406 [26], and WU2 [27]. The genes of all clusters are closely related when compared among each other differing in between 1-4 nucleotides per gene except for a short highly variable region within wchE of the WU2 strain. However, they all differed considerably to the cps3 Taï sequence with 9 to 18 bp changes per gene ( Fig. S2 and Table 1). Figure 5 shows a neighbour joining tree for the 3823 bp region including the promoter and udg/wchE/galU, clearly documenting that the cps3 Taï genes are more distantly related to any of the human samples than these are to each other (Fig. 5).
In the regions flanking the cps3 cluster differences between the ape and the human samples are also noteworthy. In the 39-region flanking aliA Taï , a large 1.6 kb deletion has occurred (see Fig. 2). However, the AliA gene in the genome of SP3-BS71 is also affected, since the integration of a transposase resulted in deletion of a small part of the aliA 59-region.
Expression of the cps3 genes is driven by a strong promoter upstream of udg [34]; and references within] and 112 bp of cps3 Taï correspond to this region. There are 12 alterations (10 substitutions, one deletion A 248 and one insertion T 223 with A 1 TG representing the start codon of udg). However, they do not include a mutation described within the 235 region that affects expression considerably [34], and the +1 position as well as 210 and 235 regions are well conserved in cps3 Taï sample. It is therefore likely that all genes in the cps3 Taï cluster can be expressed.
Transformation of the unencapsulated S. pneumoniae R6 gene with cps3 Taï In order to see whether the genes of the cps3 Taï cluster can be expressed from its promoter, and whether they indeed encode functional products, a 3 kb PCR fragment including the promoter region plus ugd and wchE was ligated into pSW1 as described in the Materials and Methods section. The ligation mixture was then used to transform the unencapsulated laboratory strain R6 which contains a deletion in its cps2 cluster [35]. Since S. pneumoniae R6 contains pgm 2 and galU 2 corresponding to spr1351 and spr1903, respectively, their functions were expected to complement the enzymatic machinery required for CPS3 synthesis. The ligation mixture was used as donor DNA, since wildtype colonies resulting from transformation with the religated vector fragment should easily be distinguishable from transformants containing the vector plus the 3 kb fragment and thus expressing a polysaccharide capsule. Trimethoprim resistant colonies were obtained readily, and indeed two types of colonies were apparent: appr. 40%   showed no difference to the small colonies of the parental strain R6, whereas 60% had a striking mucoid phenotype typical for the type 3 capsule (Fig. 6A). This phenotype was stably maintained during several passages of single colonies (Fig. 6B). The presence of capsular material of type 3 was further verified using type 3 antiserum in a Quellung reaction (not shown). Integration of the 3 kb fragment into the bgaA locus was confirmed in six mucoid colonies by PCR using primers flanking the integration site. Thus, ugd Taï and wchE Taï in combination with pgm 2-R6 and galU 2-R6 were sufficient to drive biosynthesis of the capsule 3 polysaccharide.

Discussion
The presence of the S. pneumoniae specific cps3 cluster in samples from dead wild apes confirmed the presence of pneumococci in the deceased animals. The samples investigated here represent both clones that were identified previously STs 2308 and 2309 [24], and were taken seven years apart. Although the allelic profile of the two clones is completely distinct, they contained identical DNA sequences of the cps3 cluster that differed largely from that of other type 3 isolates. It is also remarkable, that among the over 6300 STs listed in the MLST data base in April 2011, no human isolate has the same ST compared to that of the chimpanzee associated S. pneumoniae but differs in at least four out of the seven alleles used for MLST. Several distinct STs for type 3 isolates are known, with ST458 predominating in South Africa [36], whereas ST180 is the dominant clone in many other countries [37][38][39][40]. The unique cps3 Taï sequence adds further evidence that the two clones in the Taï National Park occur endemically, and suggests some selective advantage favouring recent acquisition of this CPS type. Serotype 3 is among the serotypes with the highest invasive capacity in human [41], and it is thus likely that S. pneumoniae played a substantial role in causing the death of the chimpanzees even though other pathogens have probably contributed to the disease [24].
The capsule is one of the major virulence factors of S. pneumoniae. Clones associated with animals held in captivity or as pets expressed many different serotypes, and most clones were identical to human isolates. However, guinea pigs seemed to be infected by a new clone of serotype 19F [17], and new clones of serotype 3 were isolated from racing horses [17,22]. The identification of serotype 3 clones in wild animals described in the present manuscript is another example suggesting that specialized S. pneumoniae clones can be associated with animals. It has been suggested that the animal host of the Taï clones is not the chimpanzee but small rodents or monkeys that are part of the ape's diet [24]. The reason for the persistence of the S. pneumoniae clones in the Taï National Park is not clear. We do not believe that the capsule itself is involved in this property, since there is no indication that the capsule of the Taï samples is biochemically distinct from the known type 3 structure. It is more likely that other genomic components of these pneumococcal clones are responsible for their capacity to persist in this area. Also studies on the virulence potential of these clones have to await the isolation of the bacteria which has not been possible so far.
There are only four genes required for biosynthesis of CPS3 (Fig. 3). The two genes ugd (UDP-Glc dehydrogenase) and wchE (CPS3 synthase) involved in the last two steps are essential. The other two genes located in the cps3 locus -pgm catalyzing the production of Glc-1-P from Glc-6-P, and galU converting Glc-1-P to UDP-Glc -are dispensable, since homologues galU 2 and pgm 2 are present elsewhere in the S. pneumoniae genome. It is peculiar, that not only the truncated pgm gene within the cps3 cluster can be deleted without affecting CPS3 production, but that also deletion of galU has no effect, whereas mutants in galU 2 or pgm 2 produced almost no CPS3 and were strongly affected in virulence [32,42]. This documents that it is the two genomic genes outside the cps locus that are mainly involved for CPS3 biosynthesis rather than  their homologues in the cps3 cluster. The fact that transformation of the ugd-wchE Taï region into the unencapsulated S. pneumoniae R6 strain results in type 3 colonies as shown here documents that the absence of both genes galU and pgm simultaneously in the cps locus has no apparent impact on CPS3 production and thus clearly defines the minimal size of the cps3 region required for CPS3 synthesis. It also proves that the cps3 Taï cluster is functional despite considerable alterations in the promoter region as well as in udg and wchE.
The comparative DNA sequence analysis of cps3 Taï revealed several features that document an evolutionary history distinct from all other known cps3 loci. RFLP analysis of restriction digests from WU2 and another four type 3 strains confirmed a high degree of uniformity of this locus including the transposon upstream of the AliA gene flanking the cps cluster [29]. However, cps3 Taï is at least 2 kb shorter due to the absence of region I (see Fig. 2), and a 39-region that includes a transposon as well as substantial parts of aliA is also missing. The AliA gene is generally truncated in cps3 clusters [29]. Probably aliA is not required in S. pneumoniae due to the presence of several other related oligopeptide permease genes [43]. Nevertheless, AliA mutants have been shown to colonize the nasopharynx considerably less using the type 2 strain D39 [44], and thus other factors might compensate this defect in the serotype 3 isolates of high virulence potential.
The four cps3 loci where sequence information is available are more similar to each other than they are to cps3 Taï (Figs. 5, S1, S2 and Table 1). Furthermore, the Pgm Taï gene is unique in that it represents a full size homologue in contrast to the truncated pgm versions in the other cps3 loci including those found among recently shot gun sequenced S. pneumoniae isolates (http://www. ncbi.nlm.nih.gov/sutils/genom_table.cgi), and again the Pgm Taï gene is more different compared to all others (Fig. 4). Remarkably, the G+C content of pgm resembles that of S. pneumoniae genomes and other streptococci with 41.3%, whereas the G+C of other cps3 genes is significantly lower (34-37%), similar to CPS synthesizing genes in other cps loci [10]. In summary, two conclusions can be drawn from these data. The cps3 cluster contains genes from at least two sources as judged from the G+C content. Furthermore, cps3 Taï has evolved separately for a certain time period before diversification of the other cps3 clusters has occurred, resulting in a higher percentage of mutations and distinct deletion events. The availability of sequences from other type 3 S. pneumoniae clones would be desirable to broaden our understanding on the evolution of this cluster.

Bacterial strains and media
S. pneumoniae R6, a nonencapsulated derivative of the Rockefeller University strain R36A [45], was used for transformation experiments. Cells were grown at 37uC without aeration in Cmedium [46] supplemented with 0.2% yeast extract (Difco) or on blood agar plates (D-agar supplemented with 3% defibrinated sheep blood (Oxoid) [47]. Growth in liquid culture was monitored by nephelometry.
Escherichia coli strain DH5a was used for propagation of plasmid pSW1. E. coli strains were grown aerobically at 37uC either in LB medium or on LB agar plates [48]. Plasmid pSW1 was selected in E. coli with 200 mg/ml ampicillin.

Transformation procedure
Transformation of S. pneumoniae R6 strains was performed according to published procedures [49]. Transformants containing pSW1 were selected with trimethoprim at 15 mg/ml.

DNA manipulations
All DNA techniques were performed using standard methods [48]. Multiplex PCR for 39 capsular serotypes/serogroups was performed by using seven sequential reactions as described by Pai et al. [25]. The primer sets specific for Africa clinical specimen were used as described by the CDC (http://www.cdc.gov/ ncidod/biotech/strep/pcr.htm). PCR reactions were performed using GoldStar Taq polymerase (Eurogentec) according to the manufacturer's instructions. DNA isolated from three deceased chimpanzee lung tissue samples was used: Loukoum (1999, North community, ST2308), Candy (2006, East community, ST2308) and Ophelia (2004, South community, ST2309) [24]. The cps cluster was amplified using primers located in the genes dexB (dexB-for CATCATGGACTTGGTGGTCAATCATACCTCG-GATGAG) and aliA (aliA-rev TAGACAAGATTGGACGCC-CTGTACGAGATGTAGTTGG). Long-range PCR were performed using high-fidelity iProof polymerase (Bio-Rad) according to the manufacturer's instructions. The amplified products were sequenced by primer walking. PCR products were purified using the PCR clean-up gel extraction kit (Macherey-Nagel). Chromosomal DNA was isolated from S. pneumoniae as described previously [50]. Plasmids from E. coli were isolated using the QIAprep Spin Miniprep kit (Qiagen). Restriction nucleases and T4 DNA ligase were purchased from BioLabs and used according to the recommendations of the suppliers.

Construction of R6bgaA::udg-wchE
The region covering promoter and the two genes udg and wchE essential for type 3 capsular polysaccharide biosynthesis was PCR amplified using oligonucleotides pDD01 (CGCGGATCCACC-GATAGTGTGGTTAATGTTG) and pDD02 (CTAGC-TAGCCCAGCCCTGCTGCAGGAATAACAG), treated with BamHI and NheI, and ligated to pSW1 previously digested with the same enzymes. The ligation mixture was used to transform S. pneumoniae R6, and trimethoprim resistance colonies were selected. Approximately 60% of the transformants displayed mucoid colony appearance and correct integration of the insert into the genome was confirmed by PCR. pSW1 plasmid contains a pBR322-derived origin of replication for replication in E. coli but not in S. pneumoniae, and details will be described elsewhere. Briefly, it carries a trimethoprim resistance marker [51] which can be used for selection of the transformants in S. pneumoniae, and the b-lactamase gene (bla) confers ampicillin resistance in E. coli. Genes of interest can be cloned via multiple cloning sites with recognition sequences for KpnI, SmaI, XbaI, BamHI, SalI. Flanking regions are homologous to S. pneumoniae sequences allowing integration into the chromosome by double crossover at the bgaA locus thereby replacing an intergenic region between bgaA and the adjacent gene spr0566.
Quellung reaction. The strains were serotyped by Quellung reaction using type serum 3 provided by the Statens Serum Institut, Copenhagen, Denmark [52].
Phylogenetic analysis. The evolutionary history of cps genes was inferred using the Neighbour-Joining method [53]. Evolutionary distances were computed using the Maximum Composite Likelihood method [54]. All positions containing gaps and missing data were eliminated from the dataset (complete deletion option). Phylogenetic analyses were conducted in MEGA4 [55].
Nucleotide sequence accession number. The DNA sequence described here (cps3 Taï ) is deposited in GenBank under accession No. JF836868. Figure S1 Comparative sequence analysis of the Pgm Taï gene. Shown are sites where at least one sequence differs from the reference sequence. A: nucleotide sequence; B: amino acid sequence. The codon numbers (amino acids) are indicated vertically in the first three rows; sites 1, 2 and 3 refer to the first, second and third positions in the respective codon. The authentic stop codon in pgm Taï occurs at codon 179 and is indicated by (*) in the amino acid alignment. The sequences from the following strains were used (Acc