March 2019 dengue fever outbreak at the Kenyan south coast involving dengue virus serotype 3, genotypes III and V

The first description of a disease resembling dengue fever (DF) was in the 15th century slave trade era by Spanish sailors visiting the Tanzania coast. The disease, then associated with evil spirits is now known to be caused by four serotypes of dengue virus (DENV1-4) that are transmitted by Aedes mosquitoes. Kenya has experienced multiple outbreaks, mostly associated with DENV-2. In this study, plasma samples obtained from 37 febrile patients during a DF outbreak at Kenya’s south coast in March 2019 were screened for DENV. Total RNA was extracted and screened for the alpha- and flavi-viruses by real-time polymerase chain reaction (qPCR). DENV-3 was the only virus detected. Shotgun metagenomics and targeted sequencing were used to obtain DENV whole genomes and the complete envelope genes (E gene) respectively. Sequences were used to infer phylogenies and time-scaled genealogies. Following Maximum likelihood and Bayesian phylogenetic analysis, two DENV-3 genotypes (III, n = 15 and V, n = 2) were found. We determined that the two genotypes had been in circulation since 2015, and that both had been introduced independently. Genotype III’s origin was estimated to have been from Pakistan. Although the origin of genotype V could not be ascertained due to rarity of these sequences globally, it was most related to a 2006 Brazilian isolate. Unlike genotype III that has been described in East and West Africa multiple times, this was the second description of genotype V in Kenya. Of note, there was marked amino acid variances in the E gene between study samples and the Thailand DENV-3 strain used in the approved Dengvaxia vaccine. It remains to be seen whether these variances negatively impact the efficacy of the Dengvaxia or future vaccines.


Introduction
Dengue fever (DF), believed to be a corruption of Kiswahili words "Ka-dinga pepo" signifying a disease characterized by a sudden cramp-like seizure, caused by an evil spirit was first described in 15th century by Spanish sailors visiting the then Tanganyika (Tanzanian today) coast [1]. Despite these early reports, there exist sparse data on the extent of its transmission, expertise and awareness of the disease by the clinicians [32]. More recently, DENV-1-4 were detected in a 2014-2015 pediatric febrile surveillance study in Western Kenya [33,34].
In the past decade, there has been increased reports of DF outbreaks at the Kenyan coast probably due to improved diagnosis, surveillance, and enhanced spread of the virus as a result of the ever increasing trans human movement from endemic to non-endemic areas. Collectively, the Viral Pathogen Resource (VIPR) [35] contained over 24,295 DENV genome sequences as at March 2021, of which only 262 are from Africa (DENV-1 = 47, DENV-2 = 98, DENV-3 = 74 and DENV-4 = 13). Of these, Kenya has contributed 51 sequences (DENV-1 = 5, DENV-2 = 30, DENV-3 = 8 and DENV-4 = 8). To redress the surveillance gap and under-representation of Africa DENV sequences, the US Army has invested heavily in DENV genomic surveillance. A recent genomic study on Kenya's DENV-2 obtained during a 2017 DF outbreak in Malindi indicated trends toward the emergence of new immunogens that are relevant to vaccine design [28]. In an effort to continue to remedy the scarcity of Kenyan DENV genome sequences, we report on DENV-3 sequences obtained from a 2019 DF outbreak at the Kenyan south coast. 37

RNA extraction, DENV serotyping and sequencing
Total RNA was extracted from the plasma samples using Genetic Signature's virus nucleic acids extraction kit (Genetic Signatures, NSW, Australia) on the MagPurix Evo automated extraction System (Zinexts Life Science Corp, Taiwan). An aliquot of the extracted RNA was screened for the presence of alpha-and flavi-viruses using the EasyScreen typing kit (Genetic Signatures, NSW, Australia) according to the manufacturer's instructions and as previously described [36].
RNA for sequencing was extracted using the Direct-zol miniprep kit (Zymo Research, CA, USA). To enrich for viral pathogens, the host nucleic acids were depleted by treatment with Ribo-Zero Gold rRNA removal kit (Illumina, CA, USA). cDNA synthesis was performed by sequence-independent single-primer amplification (SISPA) [37], followed by cDNA amplification using MyTaq DNA polymerase (Bioline, MA, USA). Sequence libraries were prepared using Nextera XT kit (Illumina, CA, USA) and a 12 pM library was then sequenced using MiSeq reagent v.3 600 cycles (Illumina, CA, USA) on MiSeq platform (Illumina, CA, USA).
To obtain complete E genes from the specimens with partial DENV sequences, cDNA from these samples was amplified using E gene specific primers. Briefly, the complete DENV-3 E gene (~1,479 bp) was amplified in a nested PCR using primers that were designed in house using NCBI's Primer-Blast tool [38]. For the primary amplification, the forward DENV-3-610E: CGACAAGAGATCAGTGGCGT and reverse (DENV-3-2999E: GCCATGTGCAGGTTTT-CACC primers were used. For the subsequent nested amplification, a second primer pair comprising DENV-3-824E: AAGGTCTGTCAGGAGCTACG as forward primer and DENV-3-2410E: CTTCCACCTCCCACACATTCC as reverse primer were used. The PCR reactions contained 2 μL of the cDNA, 2 μL of 0.5 μM primer sets and 1 μL of 5 U of MyTaq DNA polymerase (Bioline, MA, USA). Thermal cycling conditions for the first and second PCR were similar and included initial denaturation at 95˚C for 1 min, 35 cycles of denaturation at 95˚C for 1 min, annealing at 56˚C for 30 sec, extension at 72˚C for 1 min and final extension at 72˚C for 5 min. Non-target controls (PCR water) were used to track contamination. Amplified products were visualized on 1% agarose gels stained with GelRed (Biotium Inc., USA). Amplicons were cleaned using AmpureXP beads (Beckman Coulter, CA USA) and used to make sequence libraries with the Nextera XT kit (Illumina, CA, USA). 12 pM of the library was sequenced on the MiSeq platform, with 600 v.3 paired end chemistry (Illumina, CA, USA).

Sequence assembly and phylogenetics
Demultiplexed sequence reads from both shotgun RNA sequencing and targeted sequencing were retrieved from the Miseq. The reads were processed using an in-house pipeline. In brief, the raw reads were passed through Trimmomatic v 0.22 to remove short sequences (less than 40 nucleotides), remove sequencing adapters, failed reads and reads with poor base quality scores (<Q30). The reads were then filtered using cutadapt v 1.1 to clip surviving adapters and SISPA primers. For the shotgun assay, the clean reads were mapped against the host (human genome reference-Hg38) using bowtie v2 to subtract reads matching to the human genome. The host depleted reads were then assembled de novo using Ray v2.3. The generated contigs were taxonomically identified by querying them against other DENV sequences using BLASTn in the non-redundant nucleotide database (GenBank). The DENV genome whose homology was closest to our sequences (GenBank accession: MK894339.1) was retrieved and used for a reference guided assembly in the CLC Genomics workbench v 8.5.1 (QIAGEN, CA, USA).
The sequences from the targeted (E gene sequencing) and shotgun assays (whole genome) were mapped against the MK894339.1 reference genome using CLC Genomics workbench v 8.5.1 with default parameters. To call the consensus sequence, a minimum base depth coverage of 5 and the majority consensus rule was employed. In case of positions with conflicting base calls, quality scores were used to resolve conflicts.
To determine the phylogenetic relationships between the Kenyan DENV-3, a comprehensive subset of curated, annotated and published DENV-3 dataset was obtained from VIPR [35]. The datasets were down sampled by geographical location, year of collection, genotype and presence of the full genomes, or the presence of complete E genes (see S2 Table). In brief, VIPR database was accessed on 14 th March 2021 to retrieve context sequences for phylogeny. Only complete sequences with a known collection date, genotype, and geographical location were used. Sequences were clustered and cluster representatives selected ensuring consideration of the indicated set criteria. Where multiple cluster members shared similar temporalspatial information, representatives were randomly selected. These global datasets and the Kenyan samples were used to conduct two multiple sequence alignments with Muscle v 3.8 [39], the first involving the complete polyprotein coding sequence, and another involving the entire E protein. Alignments were manually edited using CLC Genomics workbench v8.5.1 in order to fix any misalignments that might have been inadvertently introduced by the alignment algorithm. To avoid using sequences from recombination of different virus strains during sequence assembly, the aligned sequences were run in RDP4 software suite using RDP GENECONV and Bootscan [40]. Sequences deemed to have recombinants were excluded from downstream analysis. Maximum likelihood (ML) trees were inferred in PhyML v3.1 [41] using the Generalized time-reversible (GTR) substitution model with gamma distribution and invariant sites (GTR+Γ+I) determined by AIC and BIC tests in jModelTest v2. Branch support was estimated by approximate likelihood ratio test values [42].

Time-scaled genealogies
A strict molecular clock with coalescent constant size was used for molecular time clock analysis of the E gene. The suitability of the E gene tree for use in the estimation of temporal parameters was first evaluated in TempEst software v1.5.3 [43]. Root-to-tip regression indicated adequate temporal signal (correlation coefficient = 0.6 for n = 398 sequences). Time-scaled phylogeny was then inferred using BEAST v 1.10 package [44]. The GTR+Γ+I model of substitution estimated by jModelTest2 [42] was used to estimate the time tree. 200 Million MCMC chains were performed, and convergence of runs were estimated in Tracer v1.7.1 [43] ensuring all effective sample sizes were >200. The maximum-clade credibility tree was summed with Tree Annotator and visualized in FigTree v1.4.3.

Homology estimates of the DENV-3 E protein to the Thailand DENV-3 strain used in the approved Dengvaxia vaccine
CLC Genomics Main workbench v8.5 was used to create alignments and compute genetic distances between the 17 DENV-3 E genes from this study and to the DENV-3 strain from Thailand DENV-3 (PaH881/8 (AF349753) that is used in the approved Dengvaxia vaccine [45]. The amino acid comparison was performed by aligning the study sequences with the Thailand strain PaH881/88 sequence available in the GenBank. Comparisons were also made to other African DENV-3 sequences available in the GenBank, and this information is shown in S1 Table.

The 2019 dengue fever outbreak was caused by DENV-3
Only DENV-3 was identified at screening and was present in 21 out of the 37 specimens examined. Only 4/21 samples yielded complete genomes. The rest (17/21) yielded partial genomes, of which only 3/17 had the complete E gene. The remaining 14 samples were sequenced using targeted amplicon approach and 10 yielded complete E gene. Four samples failed to produce E gene on targeted sequencing. Table 1 shows the relationship between the qPCR cycle threshold (Ct) values and the subsequent success of obtaining whole genomes from the sequencing effort. By Shotgun metagenomics, four specimens with Ct values between 20 and 28 generated complete DENV-3. Three specimens with Ct values between 27 and 32 generated partial DENV-3 genomes but with the complete E gene. Ten specimens with Ct values between 25 and 36 generated partial DENV-3 genomes with incomplete E gene. Complete E gene was obtained after targeted sequencing of the 10 samples. Four samples with Ct values >36 failed to produce E gene on targeted sequencing.

Phylogenetic tree derived from the complete genome of Kenya DENV-3
Whole genomes from the 4 Kenya DENV-3 samples and those sampled from the global dataset (n = 221) were used for phylogenetic analysis. The closest polyprotein gene homologues (99.73% nucleotide identity) was with a DENV-3 of genotype III that was identified in a Chinese traveler who had visited Tanzania (Accession number MK894339.1). The sequences also shared a most recent common ancestor (MRCA) with the Asian DENV-3 from Pakistan.

DENV-3 genotype III and V were co-circulating during the 2019 DF outbreak at the Kenyan coast
DENV-3 E genes from this study (n = 17) and those sampled from the global dataset (n = 383) were used to infer phylogenetic trees. Both the ML and Bayesian phylogenetic trees were concordant. As shown in Fig 2A-2C), majority of the samples (n = 15, 88.2%) branched with the genotype III ( Fig 2B) in a well-supported monophyletic clade (aLRT and Posterior probability = 1). The others (n = 2, 11.8%) branched with genotype V (Fig 2C). Genotype III ( Fig  2B) branched together with a sequence from China (Genbank accession number MK894339). This clade shared a MCRA with Asian genotype III from Pakistan and India. Genotype III E sequences had a nucleotide sequence similarity > 99.3% (S1

DENV-3 from the 2019 Coastal Kenya outbreak has been in circulation since 2015
The molecular clock analysis was used to estimate the evolutionary rates of the dataset (n = 398 sequences) under a strict coalescent constant size with the least likelihood (-18826.8).
The substitution rate of DENV-3 was estimated to be 7.7E-4 (95% Highest Posterior Density HPD: 6.9-8.4). As shown in Fig 3, the molecular clock analysis indicated that the TMRCA for the Kenyan DENV-3 genotype III was in 2015 (95% (HPD): 2013-2016, node probability = 1)  Table 2 shows the amino acids comparison between the 17 DENV-3 E proteins from this study and the PaH881/8 (AF349753) strain from Thailand that is incorporated in the currently approved tetravalent, live attenuated Dengvaxia vaccine [46]. There are 23 amino acid variances between study strains and the vaccine strain. Similar variances were observed in the other African DENV-3 sequences available in the GenBank (S1 Table).

Discussion
DF outbreaks at the Kenyan coast have recently become an annual occurrence [2, 24-28, 34, 47]. The cause of this increased transmission is probably multi-factorial, including the rapid movement of people from one region to the other, increase in population growth, increased urbanization [48] and probably changes in weather patterns that may be favoring increased activities of Ae. Aegypti and Ae. albopictus, the mosquito vectors responsible for transmitting DENV within countries neighboring the Indian Ocean [49].

Table 2. Sequence alignment of the 17 Kenyan DENV-3 E proteins to the DENV-3 vaccine strain from Thailand (AF349753).
The first 15 amino acid sequences are genotype III and the last 2 are genotype V. Position 10 13 14 81 93 124 147 154 169 217 270 271 275 301 340 377 383 391 426 447 452 478 489 Genotypes   AF349753  R E G I  K S  N  D  V  P  N  S  G  L  G  V  K  K  V  S  I  V  A  III   MTW-341  ---V -P  D  E  T  ----T  E  I  N  ---V  I  -MTW-419  ---V -P  D  E  T  ----T  E  I  N  ---

PLOS GLOBAL PUBLIC HEALTH
Dengue fever outbreak involving DENV-3, genotypes III and V Most cases of DF occur during the rainy season that results in accumulation of water pools which provide breeding habitats for mosquitoes [47]. The hot and humid tropical climate at the Kenyan coast receives bimodal rainfall with long rains experienced between March and June and short rains from August to October [50]. In the current study, we sought to characterize DENV-3 strains associated with a DF outbreak that occurred in March 2019 at south coast of Kenya.
Unlike in the previous DF outbreaks that were caused by DENV-2 of the cosmopolitan lineage [25][26][27][28], the March 2019 outbreak was caused by DENV-3. The last time DENV-3 was reported in Kenya was in 2011-2014 in Mandera, and these outbreaks are believed to have originated from Somalia [23,24]. Before that, DENV-3 had been reported in 1984 and 1985 at the coastal region of the Indian Ocean, specifically Pemba and Mozambique [51,52]. It was later detected between 1992 and 1993 in the United States troops serving in Somalia, and also in the areas neighboring the Persian Gulf [53]. In other parts of Africa, DENV-3 has been reported in West Africa (Senegal, Côte d'Ivoire, Togo, Benin) and East Africa (Djibouti, Somalia, Tanzania, Madagascar) and phylogenetic studies trace these outbreaks to the Indian subcontinent [54][55][56].
Advanced characterization of DENV relies on WGS. As shown in Table 1, our attempts to obtain complete genomes were only successful in 4 out of 21 specimens that had tested positive for DENV-3 by qPCR. Using qPCR Ct values as surrogate for viral load, we evaluated whether the failure to obtain WGS following shotgun metagenomics was related viral load, as has been suggested by Thorburn et al., [57] and Cruz et al., [58]. As shown in Table 1, the Ct values of the 4 samples with complete genomes (>10,000 bp) ranged from 20 to 28. Other specimens with similar Ct values produced partial genomes or needed targeted sequencing in order to generate the complete E gene. It was noted that samples producing complete sequences had over 1,459,000 reads mapping to DENV-3. Samples that produced incomplete genomes had much lower sequence reads (�539,000) while samples with Ct values �37 did not generate DENV sequences even after targeted sequencing for E gene. To better quantify the inverse relationship of Ct values and recovery of complete genomes, future studies would benefit from the use of a control/s with known copy numbers. We speculate that the failure to obtain WGS when Ct values were within "sequenceable" range was due to factors such as sample integrity and/or presence of background genomes from the host or commensal microbiome, as has been alluded to in a previous reports [59].
Four DENV-3 with complete polyprotein-coding sequences (10,173-10,176 bp) were assembled. The coding sequence lies between nucleotides positions 81 and 10,250, and encodes a polyprotein of 3,392 amino acids. As shown in Fig 1, the closest polyprotein gene homologues to the 4 sequences with 99.73% nucleotide identity was with a DENV-3, genotype III that was identified in a Chinese traveler (GenBank accession number MK894339.1) who had visited Tanzania in 2018. Tanzania has experienced several DF outbreaks in the last decade, the most recent having occurred in 2018 and was associated with DENV-1 and DENV-3 [60]. It is likely that the outbreak reported by Chipwaza et al., [60] could have seeded the 2019 DF outbreak at the Kenyan coast.
Whole genome sequencing is considered the gold standard for DENV systematics [17], but because there more partial genomes in global repositories than full genomes, genes such as NS1, NS3 and NS5 and E genes are also used. Despite the fact that NS1, NS3 and NS5 exhibit significantly higher phylogenetic resolution than E-gene [17], the latter has been the choice of many phylogenetic studies mostly because of its historical role in diagnostics, its influence on viral infectivity and immunogenicity [61][62][63][64]. The latter consideration is crucial for evaluating potential mismatch of the currently licensed E gene based (Dengvaxia) to circulating DENV strains. Importantly, E gene sequences are over represented in public repositories. For

PLOS GLOBAL PUBLIC HEALTH
Dengue fever outbreak involving DENV-3, genotypes III and V instance, globally for DENV-3, there are 4,678 E gene sequences compared to 1,420 full genomes, 2,003 NS1, 1,810 NS3, and 1,977 NS5 as of 4 th November 2021 [35]. Africa DENV-3 sequences are far fewer in the VIPR repository, but still, E-gene sequences predominate: 89 compared to 11 full genomes, 35 NS1, 31 NS3 and 33 NS5. Thus, phylogenies constructed with E gene are more robust. For these reasons, the E genes from the 17 samples (4 from samples with complete DENV-3 genomes, 3 from samples with complete E gene following shotgun metagenomics, and 10 with full E gene following targeted sequencing, Table 1) were used to infer phylogenetic relationships. As shown in Fig 2A, of the 17 samples, 15 clustered with genotype III (Fig 2B) and two with genotype V (Fig 2C). This is not the first report of the two dengue genotypes co-circulating in the same outbreak. A similar occurrence was reported in Guangzhou China in 2009 where DENV-3 genotype III and V were in circulation during the same period [65]. Fig 3, shows the Bayesian time-scaled phylogeny of the DENV-3 E sequences from this study together with global representative subset. As shown in Fig 3B, which is an outcrop from Fig 3A, all the Kenyan genotype III sequences branched with those from Pakistan, suggesting a high likelihood that the strains share genealogy with those from Pakistan. Pakistan has experienced multiple DF outbreaks over the past two decades [66]. The most recent occurred in Peshawar, Capital city of Khyber Pakhtunkhra, Pakistan in 2017 where DENV-2 and DENV-3 were dominant [67]. It has been suggested that, the Pakistan DENV strains trace their evolutionary history to the East African coast [1]. The tree topology of the genotype V clade ( Fig  3C) has two lineages with their TMRCA dating back to 1949 (95%HPD: 1945-1953). Lineage one contained two Kenyan south coast samples from this study (MTW-3158 and MTW-364) while lineage two contained four previously reported genotype V that were collected in 2015 from Western Kenya [29] (MT076948, MT076951, MT076950 and MT076953). It is tempting to speculate that two divergent genotype V lineages are in circulation in Kenya, probably geographically split into a coastal and Western Kenya population. We are cognizant of the limitation of this speculation, especially due to under sampling of DENV-3 genotype V in the country as well as globally. Currently, apart from the Brazil (2006) and Kenyan samples (2015 and 2019), the clade is populated by old viruses, the most recent being from China (1980). Members of the genotype V lineage are rarely reported and have previously even been considered extinct [68].
Time-scaled genealogies indicate that the DENV-3 genotype III reported in the current study have been in circulation since 2015 (Fig 3B). Considering sequences from this 2019 outbreak branched with a 2018 sequence from a Chinese traveler returning from Tanzania, there is a high probability that DENV-3 genotype III is endemic in East Africa though these conclusions suffer from the limitation of under sampled DENV-3 genomes in East Africa. Genotype V is also estimated to have been in circulation in 2015. However due to the rarity of genotype V sequences globally, it is difficult to clarify the genealogical time-scaled trees in this clade.
There are only 89 African DENV-3 sequences in the GenBank as at 04 November 2021 [35]. Our study adds 17 more DENV-3 sequences. As shown in Table 2, there are 23 amino acid variances between DENV-3 E sequences in the study samples and the DENV-3 strain from Thailand that is a component of the tetravalent, live attenuated Dengvaxia vaccine [46]. Similar variances were observed in the other African DENV-3 sequences available in the Gen-Bank (S1 Table). It is not clear how much impact such variances will have in the efficacy of the current Dengvaxia vaccine or future vaccines. As a limitation, it is important to point out that the observed E protein variances did not include the sequences encoding the DENV pre-membrane (prM) gene, which together with the E protein constitute the tetravalent Dengvaxia vaccine.

Conclusion
DENV-3 strains of genotypes III and V were identified to have been circulating locally years before the outbreak, with their TMRCA dating back to 2015. Our study suggests that DENV-3 circulating at the Kenyan coast share genealogy with those from Asia, especially Pakistan and this could have been facilitated by tourism and trade. The marked amino acid variances between the study samples and the Thailand DENV-3 strain used in the approved Dengvaxia vaccine may help inform future design of dengue vaccines. The fact that these variances, albeit from a small sample set, were similar to those in other African DENV-3 sequences and different from the vaccine representative strain from Thailand are indicative of the need to obtain more DENV sequences from African countries.  Table. Dengue isolates used in this study for the E-gene phylogenetic analysis. The datasets were down sampled by geographical location, year of collection, genotype and presence of the full genomes, or the presence of complete E genes. (DOCX)