Population Genetic Structure of Streptococcus pneumoniae in Kilifi, Kenya, Prior to the Introduction of Pneumococcal Conjugate Vaccine

Background The 10-valent pneumococcal conjugate vaccine (PCV10) was introduced in Kenya in 2011. Introduction of any PCV will perturb the existing pneumococcal population structure, thus the aim was to genotype pneumococci collected in Kilifi before PCV10. Methods and Findings Using multilocus sequence typing (MLST), we genotyped >1100 invasive and carriage pneumococci from children, the largest collection genotyped from a single resource-poor country and reported to date. Serotype 1 was the most common serotype causing invasive disease and was rarely detected in carriage; all serotype 1 isolates were members of clonal complex (CC) 217. There were temporal fluctuations in the major circulating sequence types (STs); and although 1-3 major serotype 1, 14 or 23F STs co-circulated annually, the two major serotype 5 STs mainly circulated independently. Major STs/CCs also included isolates of serotypes 3, 12F, 18C and 19A and each shared ≤2 MLST alleles with STs that circulate widely elsewhere. Major CCs associated with non-PCV10 serotypes were predominantly represented by carriage isolates, although serotype 19A and 12F CCs were largely invasive and a serotype 10A CC was equally represented by invasive and carriage isolates. Conclusions Understanding the pre-PCV10 population genetic structure in Kilifi will allow for the detection of changes in prevalence of the circulating genotypes and evidence for capsular switching post-vaccine implementation.


Introduction
The introduction of pneumococcal conjugate vaccine (PCV) into immunisation programmes in well-resourced countries led to a significant reduction in pneumococcal morbidity and mortality [1,2]. Children in resource-poor countries have a much higher incidence of life-threatening pneumococcal disease [3,4], thus the World Health Organisation recommended that PCV be introduced into developing countries with high childhood mortality, and the GAVI Alliance has provided support for PCV introduction [5]. Pneumococcal disease burden among young children living within the Kilifi District on the coast of Kenya is very high: the annual incidence of clinically-significant pneumococcal bacteraemia among children <5 years of age who presented to the outpatient department of Kilifi District Hospital was estimated at 436 cases per 100,000 [6]; and among children who were <1, <2 and <5 years of age and admitted to hospital the incidence was 241, 213 and 111 cases per 100,000, respectively [7]. Nasopharyngeal carriage prevalence among healthy children is also high: the overall population-based prevalence in a pre-PCV10 study was 66% (79% for children <1 year of age; 51% among children 4.5-5.0 yrs of age) [8]. Therefore, in January 2011, Kenya introduced the 10-valent PCV (PCV10), which contains serotypes 1, 4, 5, 6B, 7F, 9V, 14, 18C, 19F and 23F, into its childhood immunisation programme. PCV10 coverage was estimated to be 42% among carriage isolates and >70% among invasive pneumococci [8,9]. In January and March 2011 the Kenyan Government conducted a two-dose catch-up campaign for all children aged <5 years in Kilifi District to accelerate the conditions of a mature vaccination programme. Population-based surveillance established in 2001 at Kilifi District Hospital will be used to evaluate the impact of PCV10 introduction.
In well-resourced countries, PCV introduction led to a significant overall reduction in the incidence of invasive pneumococcal disease due to vaccine serotypes. PCV also led to a profound reduction in the prevalence of nasopharyngeal carriage of vaccine serotypes among healthy children, but with a compensatory rise in the prevalence of nonvaccine serotypes. This resulted in fundamental changes in the transmission patterns of serotypes, which had an effect on disease: the reduction in transmission of vaccine serotype pneumococci led to a herd-protection effect, benefiting older unvaccinated individuals, and the increased circulation of nonvaccine serotypes led to 'serotype replacement disease', which attenuated the net benefit of PCV introduction. However, many nonvaccine serotypes appear to be inherently less invasive than vaccine serotypes, thus the reduction in invasive disease caused by vaccine serotypes has, in most populations, exceeded the increase in serotype replacement disease [10][11][12][13][14][15][16][17][18]. Changes in the circulating serotypes also resulted in a concomitant change in the circulating genotypes in developed countries, since the serotype and multilocus sequence typing (MLST) genotype are closely associated, with known exceptions [14,[19][20][21][22]. Much of the focus centred around changes in the prevalence of serotype 19A-associated genotypes, since 19A was the predominant non-vaccine serotype that increased in prevalence after PCV7 introduction, but there were changes in the prevalence of other genotypes as well [13,23,24].
In resource-poor countries, much less is known about which serotypes and genotypes are circulating at a population level, thus the impact of PCV introduction on the pneumococcal population structure and the potential for evolution in response to vaccine selective pressure cannot easily be predicted based on the experience of introducing PCV to well-resourced countries. Therefore, in this study we used MLST to provide the first large-scale characterisation of >1100 invasive and colonising pneumococcal isolates from a single resource-poor country, with the aim of revealing the population structure prior to PCV10 introduction.

Ethical Statement
The project, SSC#1357, entitled "The effect of routine immunization with Pneumococcal Conjugate Vaccine in children on the strain structure of invasive and carriage isolates of S. pneumoniae in children and adults in Kilifi District", was reviewed by the KEMRI National Ethical Review Committee and they approved the analysis of pneumococcal isolates collected though routine surveillance at Kilifi District Hospital. They also approved the analysis of carriage isolates collected in specific research studies; for these studies individual informed consent was obtained from every parent/guardian. All data in this genotyping study were analysed anonymously.

Selection of Isolates for Genotyping
Invasive isolates were recovered from the blood, cerebrospinal fluid or pleural fluid of ill children <15 years of age from 1994-2008 (total n = 628). All available unique patient isolates from 1994-2002 were included and isolates were systematically selected (e.g. every other isolate in the line listing of isolates) from years 2003-2008 to obtain 25 or 50 unique patient isolates in each year's sample for genotyping. Three isolate samples from 2003-2008 were mixed or nonviable and were not genotyped.
Carriage isolates (n = 486) were collected during two previous studies conducted in the Kilifi District. The first study was performed in 2004 and sampled the nasopharynx of healthy persons of all age groups [25]; all available unique patient isolates recovered from children <5 yrs of age (n = 170) were included in the current genotyping study. The second study was performed from 2006-2008 and sampled healthy children <5 years of age in a rolling cross-sectional study of nasopharyngeal carriage; 320 isolates were randomly selected for genotyping from the entire collection of 1868 isolates, 4 of which were later removed as freezer stocks were nonviable [8].
In all studies, samples were cultured and pneumococci identified using standard microbiological methods. Isolates were serogrouped by latex agglutination and serotyped using the Quellung method in the Kilifi Laboratory. All invasive isolates were also serotyped by PCR using a published protocol, modified to better suit the Kilifi serotype distribution [26]. Any discrepancies that arose between the Quellung and PCR results were retested until consensus was achieved and the consensus serotypes were used in this study.

MLST, Data Confirmation and Analyses
MLST was performed according to the S. pneumoniae MLST protocol [27] and alleles and sequence types (STs) were assigned using the MLST website [19]. Unusual combinations of serotype and genotype were verified by repeating the genotyping and/or confirming the serotype. Serotype confirmation was done either by repeat testing in Kilifi or by PCR serotyping in Oxford using PCR serotyping primers and protocols adapted from previously published methods [26,28]. We were unable to PCR-amplify the gdh locus for one invasive serotype 23F isolate and the ddl locus for one carriage serotype 19F isolate despite repeated attempts and redesigned primers (presumably due to divergent sequence in the primer binding regions), so these two isolates were removed from genotyping analyses. The data were compiled and analysed using Microsoft Excel and STATA v. 11. STs were clustered into clonal complexes (CCs) using Phyloviz [29] with the following settings: Dataset type, Multi-locus Sequence Typing; Distance, eBURST Distance; and Level, SLV. The Kilifi dataset was combined with the entire MLST database (as of February 2012; total combined n = 16,070 isolates) for the Phyloviz analyses. The entire Kilifi dataset was also submitted to the MLST database.

Serotype and Sequence Type Distributions and Identification of Clonal Complexes
1114 invasive and carriage isolates were recovered from children in Kilifi from 1994-2008 and genotyped by MLST (Table 1). Isolate representatives of all PCV10 serotypes were found; 8 of the 10 most abundant serotypes in the invasive collection were PCV10 serotypes, as were 5 of the 10 most common serotypes in the carriage collection. Although serotype 7F was a minor serotype in this collection (n = 3 isolates), all other PCV10 serotypes were well represented (29-161 isolates characterised per serotype). Vaccine serotypes 1, 6B, 14 and 19F were each represented by >100 isolates.
There were 21 major (>15 isolate representatives) CCs identified among the entire collection of pneumococci, 13 of which were of PCV10 serotypes and 8 of which were found among non-PCV10 serotypes (Tables 2 and 3). An initial observation was that the STs and CCs circulating in Kilifi were largely different to those found circulating among most developed countries (as reported to the MLST database [19]). We submitted 40 novel alleles and 212 novel STs to the MLST database; the novel STs described 26.3% of the invasive isolates and 30.9% of the carriage isolates. The clonal complexes and sequence types found among the invasive and carriage collections, stratified by serotype, are described in Tables S1 and S2, respectively.

Population Genetic Structure Among PCV10 Serotypes
161 serotype 1 isolates from Kilifi were genotyped and all were members of CC217 1 (note: ST serotype ; Table 2). Only 3 of the serotype 1 isolates were recovered from carriage, the rest were isolated from patients with invasive disease. It was previously reported that serotype 1 genotypes had a distinctive phylogeography [30]. Ancestral ST217 1 and its closely related single locus variants (SLVs) were shown to be the predominant serotype 1 STs in Kenya and Israel, and later were also shown to be common in other African countries, namely Ghana, Burkina Faso and The Gambia [31][32][33]. Other serotype 1 genotypes appear to predominate elsewhere [19,30]. Based on available data CC217 1 still appears to be largely an African CC [19].
Vaccine serotype 4 isolates from Kilifi were predominantly invasive isolates of ST853, which are members of the widely distributed CC246 4 ( Table 2). CC246 4 has been of recent interest since unique recombinants (a serotype 4 to 19A capsular switch) in the United States emerged from this CC [34][35][36]. Only 1 isolate of CC246 in Kilifi expressed an alternative serotype and this was serotype 2, a member of a cluster of serotype 2 isolates within CC246 that have also previously been reported in Asia and Africa [19].
All of the vaccine serotype 5 isolates (n = 44) were from cases of invasive disease, and all were members of CC289 5 . ST289 is the widely-distributed Pneumococcal Molecular Epidemiology Network (PMEN) clone represented by the Columbia5-19 strain [37] and although it was a common ST (n = 14 isolates), ST245 5 , an SLV of ST289 5 , was more prevalent in Kilifi (n = 26 isolates). One major serotype 9V CC was identified (CC706 9V ) and 100% of the CC706 isolates expressed serotype 9V. 39 vaccine serotype 18C isolates were characterised in this study and 35 of these were members of CC1381 18C . ST1381 18C shares only the xpt MLST allele with ST113 18C (PMEN clone Netherlands 18C -36), which is a widely-distributed genotype associated with serotype 18C [19].
Among vaccine serotypes 6B, 14, 19F and 23F there were two major CCs circulating for each serotype and in each case the two CCs defined a large proportion of the isolate representatives for each serotype: 45% of 107 serotype 6B isolates; 85% of 103 serotype 14 isolates; 64% of 104 serotype 19F isolates; and 76% of 86 serotype 23F isolates (Table 2). One notable observation was that all 24 Kilifi isolates of ST230 expressed serotype 14 and 83% of them were causing invasive disease. CC230 is a widely-distributed CC that is predominantly serotypes 14 and 19A, although ST230 (PMEN clone Denmark14-32) isolates have been reported with alternative serotypes [19]. ST230 19A increased in prevalence in the United States (US) and Spain after PCV7 implementation [35,36,38]. In Kilifi, a double locus variant (DLV) of ST230 -ST 700 -was also prevalent, isolates of which exclusively expressed serotype 3. ST700 3 was the predominant genotype associated with serotype 3 in this Kilifi collection and ST700 3 isolates have been found elsewhere in Africa. The common globally-distributed serotype 3 genotype is ST180 3 (PMEN clone Netherlands3-31), and it is only distantly related to ST700 3 (MLST alleles gki and spi are shared) [19].
All serotype 12F isolates (n = 16) from Kilifi were members of CC989 12F and isolates of this CC have also been found  elsewhere in Africa [19]. The predicted ancestral ST, ST989 12F , shares only the spi MLST allele with the widely-distributed ST218 12F (PMEN clone Denmark 12F -34). Serotype 12F is generally uncommon among carriage isolates, which is also true of this Kilifi dataset (14 of 16 serotype 12F isolates were invasive), and has historically been shown to cause outbreaks [41,42], but it is not included in any currently available PCV.

Temporal Variation of CCs Containing Invasive Isolates Collected From 1994-2008.
Since the invasive isolate dataset in Kilifi spanned 15 years, we were able to explore temporal variation among circulating STs and CCs over that extended period of time. 12 CCs were comprised of >10 invasive isolate representatives in total over the time period, which represented 68% (424/628) of the entire invasive collection (Figure 1). The prevalence of every CC varied over time, to a greater or lesser extent. Most obvious was the dominance of CC217 1 (n = 158), which ranged in prevalence from 8 -52% of all invasive isolates characterised each year. The next most prevalent CC was CC289 5 (n = 44), which fluctuated in prevalence from 0 -22% over the period of surveillance. Other major CCs also varied in prevalence over the surveillance years. These longitudinal data will be particularly important in analyses that monitor changes in the prevalence of individual CCs, particularly CCs of nonvaccine serotypes such as CC230 (ST700 3 ), CC989 12F and CC847 19A , which ranged from 0-4%, 0-9% and 0-13% of all invasive isolates characterised each year, respectively, prior to PCV10 introduction. Note that CCs not depicted in Figure 1 made only minor contributions to the overall pre-PCV population structure and contributed 0-3 isolates in any given year (data not shown), but these data will be re-evaluated in future post-PCV10 analyses. Figures 2 and 3 depict temporal variation in the prevalence of several of the major STs circulating among children with invasive disease in Kilifi over the 15 year surveillance period. These STs were selected for analysis because there were multiple major STs (≥10 isolates) that expressed the same serotype. Serotype 1 isolates were recovered every year of surveillance and the three major STs associated with serotype 1 are all closely related: ST217 is the predicted ancestor and ST613 and ST614 are SLVs of ST217. Either two or three of these STs were represented in 10 of the 15 surveillance years, and in the remaining five years only one ST was detected (Figure 2A). Figure 2B depicts the two major serotype 5 STs (which are SLVs of each other), but despite their genetic similarity at an ST level these STs were more restricted with respect to when they were detected, i.e. ST289 5 was predominant in the early years of surveillance, but was replaced by ST245 5 in the latter years. In contrast, Figures 3A  and 3B depict the prevalence of pairs of unrelated serotype 14 and 23F genotypes, respectively, in each calendar year. The two serotype 14 STs were circulating concomitantly in half of the surveillance years, and both serotype 23F STs were detected in four of the surveillance years (note that only 10 isolates of ST988 23F were detected overall). Figure 4 completes the picture for the circulating major STs, demonstrating that although only one major genotype predominated for each of these serotypes, none of these major genotypes circulated every year throughout the surveillance period. Although a frequency of ≥10 isolates was used as the criterion for "major" ST and only nine isolates of ST989 12F were recovered over the study period, the ST989 12F data are shown to demonstrate that there was no serotype 12F epidemic over this surveillance period.

Discussion
The introduction of PCV10 is vitally important for Kenya since the morbidity and mortality associated with pneumococcal disease is high. The incidence of invasive disease associated with PCV10 serotypes has decreased since vaccination began [43], with a marked attenuation of morbidity and mortality among vaccinated children, and potentially also among unvaccinated older children and adults if herd protection proves to be sufficient. It is expected that introduction of PCV10 in Kenya will perturb the pneumococcal population genetic structure, but a key question is whether or not these perturbations will diminish the overall benefit of vaccine introduction by resulting in increased nonvaccine serotype disease. By comparison, the US has used PCV for the longest period of time and the most notable nonvaccine serotype increase has been that of serotype 19A disease, which was in part due to the emergence of a novel genotype [34][35][36]. Importantly, this new recombinant (ST695 19A ) quickly became the third most common serotype 19A CC causing invasive disease in the US [35], which emphasises the importance of detecting such population-based genetic changes.
Post-vaccine changes in the US were revealed because the pre-vaccine pneumococcal population genetic structure in the US was well defined. Active, population-and laboratory-based invasive disease surveillance has been on-going in the US since 1995 (5 years before PCV was introduced) and two studies specifically characterised large, representative collections of pneumococci to identify the pre-PCV7 baseline set of genotypes circulating in the US [14,20]. The Kilifi pneumococcal invasive disease surveillance programme is also active, population-and laboratory-based, and collected isolates and data for 15 years prior to PCV introduction. Several carefully designed pneumococcal carriage studies have also been performed during that time [8,25]. The data (patient demographics, clinical outcome, serotype) combined with the genotyping work described here provide a comprehensive description of the pre-vaccine pneumococcal population in Kilifi.
MLST has been used to characterise many thousands of pneumococci collected across Europe, North and South America, Australia and Asia and many of the major STs within countries have disseminated across continents, e.g. STs 81 23 F , 19F , 90 6 B, 156 9 V, 9 14 , 124 14 , 113 18 C, 218 12 F, 191 7 F, 180 3 , 199 19 A [19]. Many such STs have been identified as PMEN clones, in part defined by their widespread nature [19,37]. Our study demonstrated that although the major serotypes found in Kilifi and Africa are similar to those found elsewhere in the world, most of the major STs/CCs are different, at least as far as we can tell from the data in the MLST database. It is important to note that the contribution of data to the MLST database is voluntary -investigators must submit new alleles and new STs to the database for assignment, but only rarely do investigators submit their entire study dataset. Once an ST is added to the database, there is no obligation for another investigator who subsequently detects that ST elsewhere to indicate this to the curators. Thus, the MLST database reliably captures allelic diversity, but we do not have all the information about all isolates genotyped to know for certain whether some of the STs detected in Kilifi are truly African in origin, even though that is what the database suggests. However, with those caveats, the MLST database currently contains over 9,000 STs and more than 21,000 isolates recovered from all over the globe, and we do know that Africa is generally underrepresented in the MLST database as compared to other geographical regions. Therefore these data significantly increase our understanding of the pneumococcal population structure in Kilifi, and possibly in Africa more broadly. However, since many of the major STs identified in Kilifi were different to the major STs of the same serotypes that circulate globally, we have little additional knowledge on which to predict their likely increase or evolution and therefore follow-up studies will be essential.
An interesting observation was the temporal fluctuation in STs that expressed the same serotypes, accepting that for some of these serotypes the numbers observed each year were small. There were no major changes in Kilifi District (e.g. changes in laboratory methods or practice, antimicrobial use or stewardship, early vaccine uptake, etc.) over the surveillance period that would have contributed to big fluctuations in the circulating serotypes or genotypes. The prevalence of circulating serotypes is known to vary and thus temporal fluctuations in circulating serotypes and genotypes in Kilifi were not surprising [44][45][46][47]; however, our data might suggest that for serotypes 1, 14 and 23F it mattered less which of the different STs associated with these serotypes was circulating. Previously published papers have debated whether it is the serotype or MLST genotype that plays a more important role in the potential for an isolate to cause invasive disease [22,[48][49][50]. The argument favouring a primary role of serotype is supported by the fact that in Kilifi different major genotypes expressing the same serotype were detected concomitantly. All three serotype 1 STs were closely related, but the pairs of serotype 14 and 23F STs were unrelated. Alternatively, the two serotype 5 STs, although closely related at the MLST loci, appeared to be more restricted in their circulation. Perhaps the two serotype 5 STs differ markedly elsewhere in the genome, and as a result one serotype 5 genotype can outcompete the other and make co-circulation less likely. It is known that the immunological response differs for each pneumococcal serotype [51], and thus it may be the serotype-specific immune response that largely determines the circulation of serotypes regardless of which genotypic backbone they maintain. More likely, it is a particular combination of serotype (and corresponding immunity within the human population) and genotype that is the best explanation for the temporal patterns of ST circulation we observed in Kilifi.
Several major CCs were comprised of nonvaccine serotype isolates (10A, 13, 15A, 15BC and 35B) primarily recovered from healthy children, although they were also recovered from children with invasive disease so the potential for serotype replacement disease remains. Serotypes 10A, 15BC and 35B increased in prevalence in the US post-PCV7 vaccine [14,15,23,52], but the predominant STs were different to those found in Kilifi so the predictive power based on genotype is minimal [14,23,24]. Nonvaccine serotypes 3, 12F and 19A were already prevalent prior to vaccine introduction and their ability to cause invasive disease in Kilifi and elsewhere is clear. Serotypes 3 and 19A significantly increased in prevalence post-vaccine implementation in the US, although serotype 12F significantly decreased [14,15]. Serotype 19A has also significantly increased elsewhere [40,53].
In this study we established the baseline set of genotypes in Kilifi prior to PCV10 introduction, which will allow for the detection of changes in prevalence of pre-existing STs and the identification of new nonvaccine STs (as putative imports or new recombinants). It will be essential that any perceived increases or decreases in serotype or genotype prevalence after PCV10 vaccination is established in Kilifi be considered in the context of the pre-vaccine genotypic landscape. Table S1. Clonal complexes and sequence types found in the invasive pneumococcal collection, stratified by serotype.