Genetic diversity of the Plasmodium falciparum GTP-cyclohydrolase 1, dihydrofolate reductase and dihydropteroate synthetase genes reveals new insights into sulfadoxine-pyrimethamine antimalarial drug resistance

Plasmodium falciparum parasites resistant to antimalarial treatments have hindered malaria disease control. Sulfadoxine-pyrimethamine (SP) was used globally as a first-line treatment for malaria after wide-spread resistance to chloroquine emerged and, although replaced by artemisinin combinations, is currently used as intermittent preventive treatment of malaria in pregnancy and in young children as part of seasonal malaria chemoprophylaxis in sub-Saharan Africa. The emergence of SP-resistant parasites has been predominantly driven by cumulative build-up of mutations in the dihydrofolate reductase (pfdhfr) and dihydropteroate synthetase (pfdhps) genes, but additional amplifications in the folate pathway rate-limiting pfgch1 gene and promoter, have recently been described. However, the genetic make-up and prevalence of those amplifications is not fully understood. We analyse the whole genome sequence data of 4,134 P. falciparum isolates across 29 malaria endemic countries, and reveal that the pfgch1 gene and promoter amplifications have at least ten different forms, occurring collectively in 23% and 34% in Southeast Asian and African isolates, respectively. Amplifications are more likely to be present in isolates with a greater accumulation of pfdhfr and pfdhps substitutions (median of 1 additional mutations; P<0.00001), and there was evidence that the frequency of pfgch1 variants may be increasing in some African populations, presumably under the pressure of SP for chemoprophylaxis and anti-folate containing antibiotics used for the treatment of bacterial infections. The selection of P. falciparum with pfgch1 amplifications may enhance the fitness of parasites with pfdhfr and pfdhps substitutions, potentially threatening the efficacy of this regimen for prevention of malaria in vulnerable groups. Our work describes new pfgch1 amplifications that can be used to inform the surveillance of SP drug resistance, its prophylactic use, and future experimental work to understand functional mechanisms.


Introduction
The Plasmodium falciparum parasite inflicts high morbidity and mortality on human populations in malaria endemic regions, especially in sub-Saharan Africa. To inform control measures, investigations of P. falciparum adaptation for host immune evasion, antimalarial drug resistance and other important biological mechanisms have focused on analyses of genomewide polymorphisms [1,2]. A number of studies have revealed SNPs and structural variants (e.g. duplications, amplifications or copy number variants) linked to antimalarial drugs, such as chloroquine, sulfadoxine-pyrimethamine (SP) and artemisinin [3]. SP is a combination drug which inhibits the dihydrofolate reductase (dhfr) and dihydropteroate synthetase (dhps) enzymes in the folate pathway of the parasite, and is widely used as intermittent preventive treatment of malaria in pregnancy (IPTp) and in infants as part of seasonal malaria chemoprophylaxis (SMC) in sub-Saharan Africa. SP became a first-line treatment for malaria after widespread resistance to chloroquine, but was replaced by artemisinin combination therapies for uncomplicated cases with the increasing prevalence of P. falciparum mutant alleles that confer the parasite resistance to pyrimethamine in pfdhfr (N51I, C59R, S108N, I164L) and to sulfadoxine in pfdhps (I431V, S436A/F, A437G, K540E/N, A581G, A613S/T) genes [4,5]. Sequential accumulation of the several point mutations leads to increased levels of resistance by reducing binding affinity of the drug to the folate pathway enzymes dhps and dhfr [6].
Evidence across many countries has shown that longer term usage of SP leads to a greater risk of resistance haplotypes in pfdhps and pfdhfr genes [1]. The increased prevalence in parasites with resistant haplotypes is due to selection by drug pressure from the use of SP for IPTp and SMC as well as the use of anti-folate containing antibiotics. Fixation of some of the key SP-resistant mutations in the parasite population may occur, despite discontinuation of SP as the first-line treatment for more than a decade [7], and understanding these genetics dynamics is crucial for malarial disease control. Genomic analyses of the malaria parasites have revealed copy number variations of the GTP cyclohydrolase I gene (pfgch1), which encodes the first and the rate-limiting enzyme in the folate biosynthesis pathway. Increased copy number of pfgch1 has been linked to SP resistance in Southeast Asia [8], with a direct association to pfdhfr and pfdhps alleles [9]. Similarly, a pfgch1 promoter copy number variation (amplification) in Malawian parasites with quintuple mutations (I51-R59-N108-G437-E540) has been identified, which differs from the whole gene amplification found in Southeast Asia [1,10]. The multiple copies of pfgch1 are thought to compensate for the putatively fitness-reducing mutations in pfdhfr and pfdhps by providing higher concentrations of upstream substrates in the folate-biosynthetic pathway [10].
Here we investigate the genetic diversity in the pfgch1, pfdhfr and pfdhps genes across 4,134 P. falciparum isolates from 29 malaria endemic populations. Using both long and short sequence data, we reveal there are multiple forms of pfgch1 promoter and gene copy number variations, whose frequencies are heterogeneously distributed geographically, and linked to pfdhfr and pfdhps haplotypes. Worryingly, we reveal an overall trend towards an increased prevalence of the known pfdhfr and pfdhps resistant markers, which may be attributed to SP drug pressures, and that the presence of amplified pfgch1 is exacerbating the existing problem and may be assisting with maintaining these resistant parasites and the evolution of new mutants.

Short read sequencing data
Publicly available raw sequence data were downloaded for P. falciparum field isolates (n = 6,236; 30 countries; from 2001 to 2015) from the European Nucleotide Archive (study accessions ERP000190 and ERP000199). These data include Illumina raw sequences from the MalariaGEN Community Project [11]. The data were aligned to the 3D7 strain reference genome (v3.1) using bwa-mem software (default parameters, except -c 100 -T 50). Multiplicity of infection (MOI) was calculated using estMOI software [12] and the threshold was obtained based on previous work [13]. Samples with MOI >1 (estMOI > 30%) and overall genomic coverage <10-fold were removed from the dataset, leading to 5,280 isolates used for further analysis. SNPs and small indels were called using the samtools software suite, and those with a minor allele frequency >1% in drug resistance loci retained. The final dataset contained 4,134 samples (see pathogenseq.lshtm.ac.uk for a list of ENA accession numbers) without missing or mixed SNP genotype calls in pfdhfr, pfdhps and pfgch1. Large duplications were called by applying Delly software (v0.8.1) with default settings, and their breakpoints inferred using split-reads. All structural variants of low-quality (supporting paired-end calls < 3 or average mapping quality < 20) and >100kbp in length were removed, with 1,323 potential events located in the pfgch1 gene locus or its promoter region. All isolates with multiple calls in the region of interest were also removed and 1,273 remaining duplications were clustered (start and end breakpoints ± 50) to create 10 unique events. Coverage at a chromosome, regional and locus level was calculated from the alignment files for each isolate, and used to infer a pfgch1 gene and promoter copy number in those isolates with amplifications. The copy number estimation involved normalisation using a median coverage of isolates with no duplications. African samples were classified into geographical regions based on an established grouping [14].

Long read sequencing data
Four laboratory parasite strains (K1 Thailand, D10 Papua New Guinea, NF54 Africa, T996 Thailand) were cultured under standard conditions [15] at the LSHTM and DNA was extracted using the Genomic-tip 100/G kit (QIAGEN) according to its protocol. The DNA was sequenced on the PacBio RSII long read technology. Chromosome-wide assemblies of PacBio sequence data were also available for a further five laboratory strains (7G8 Brazil, DD2 Indochina, GB4 Ghana, HB3 Honduras, IT Brazil) and nine field isolates (GN01 Guinea, SN01 Senegal, CD01 Congo, ML01 Mali, GA01 Gabon, KE01 Kenya, SD01 Sudan, and KH01 and KH02 Cambodia) (accession number ERP009847) [16]. Raw PacBio sequencing data were analysed in the SMRT Portal software using the Hierarchical Genome Assembly Process (HGAP3) pipeline, resulting in corrected long reads for each sample, and total genome sizes of 24Mbp. Corrected reads were aligned to the 3D7 reference genome (v3.1) using NGMLR (v0.2.7) software with default settings. Structural variants were identified using Sniffles (v1.0.11) software with default parameters. Manual verification was performed through alignment of candidate regions with the Mauve software [17]. All of the PacBio strains with reported structural variants in the pfgch1 locus or its promoter region were used to validate any possible putative duplications found in the Illumina short read analysis, estimate breakpoints, and distinguish different types of amplifications identified.
Collectively, one of every five Southeast Asian isolates (23.1%) had a gene (gDupE, gDupF or gDupG) amplification, with the highest occurrence in Thailand and Myanmar (38.9% and 34.4% respectively). The gDupH is an amplification of the pfgch1 locus, but not its promoter, and was almost exclusively observed in African samples (n = 19, 0.9%; Cameroon, n = 12). The gDupF, gDupG and gDupH amplifications have been reported in Thailand [8]. While pDupI and pDupJ amplifications appear exclusively in Africa and consist of the promoter of the pfgch1 gene (chr. 12: 973,848-973,907; EPD promoter ID: CZT99398_1 [24]). The pDupI (�400 kb) amplification was discovered in Malawian isolates [1] and occurs with a high frequency in the African continent (West 18.4%, Central 21.1%; East 65.2%, Southern 85.9%). The pDupJ (�310bp) amplification is a novel modification reported in two isolates.
In 7G8 (Brazil), D10 (Papua New Guinea) and K1 (Thailand) long-read sequences, we detected large copy number variants involving the pfgch1 gene and its neighbouring loci (chr. 12 location: 7G8 946,264-978,039; D10 962,087-992,006; K1 961,455-980,630); however, no evidence of their existence was found in the clinical isolates using Illumina data (n = 4,134). Almost all of the reported breakpoints are located in non-coding regions and near to long repeats consisting of A and T nucleotides, confirming characteristics of copy number variant formation across the highly repetitive P. falciparum genome [25].

Copy number variant analysis and frequency in global isolates
Copy number variation in pfgch1 has a potential role in sustaining SP resistance, and we estimated the number of copies directly using relative sequencing coverage compared to chromosome 12-wide levels of non-amplified isolates (Fig 2). For field isolates with gDupA, gDupB, and gDupC, we observed at least two copies, but in gDupD found in Madagascar there was only one copy. For gDupE and gDupF, there tended to be greater copies in Southeast Asia (median 2.66) than in West and Central Africa (median = 2.16) (Wilcoxon P = 0.03), which is consistent with the higher number of copies found in T996 (Thailand, 4 copies) compared to GB4 (Ghana, 3 copies) strains. The median number of copies of gDupG was approximately one. For gDupH events, there were at least two copies in Central Africa (median 2.00) and at least three copies for West African countries (median 3.17), consistent with the four copies of the NF54 strain. It is possible that pDupI and pDupJ have a common origin, but inference is difficult as there are only two pDupJ amplifications. There were differences in the the number of copies of the pfgch1 promoter amplifications between regions in Africa (median: Southern 4.97, East 3.71, West 3.32, Central 2.41), consistent with differences in PacBio strains (copies: Kenya KE01 3, Mali ML01 2), and potentially linked to geographical differences in SP use.

Pfdhfr/pfdhps mutations and pfgch1 amplifications
It is known that the accumulation of pfdhfr and pfdhps mutations lead to greater resistance to the SP [26]. Moreover, the pfgch1 amplification was found to facilitate further development of highly resistant pfdhfr parasites by improving their fitness and increasing resistance [27]. Multiple pfdhfr/pfdhps mutants and different pfgch1 amplifications were frequently identified together, and there was a trend for those with amplifications to have a greater number of  Table). The gDupH amplifications in West and Central Africa appear predominantly with the quadruple mutant (IRNIS-ISGKAA 50.0%), but a significant number (25%) were identified on the sextuple background (IRNIS-VAGKGS 15.0%, IRNIS-VAGKAA 10.0%). The promoter amplification pDupI ML01 amplification type was predominantly in West Africa, on two highly frequent pfdhfr-pfdhps mutant backgrounds (IRNIS-ISGKAA 31.2%, IRNIS-IAGKAA 25.6%). The remaining promoter amplifications (pDupI KE01 and KE01/ML01) were located primarily in Southern, East and Central Africa, and predominantly accompanied by a quintuple IRNIS-IS-GEAA mutant (54.5%), as well as other frequent genotypes (IRNIS-ISGKAA 16.1%, IRNIS-IS-GEGA 8.7%).
Despite the sampling period and frame of the data being "convenient" in nature rather than systematic, a time point analysis was possible using those countries with more than one year of sampling. This provided evidence of no change or increase in the frequency of pfdhps/pfdhfr mutants. For example, there was a higher frequency of quintuple and sextuple mutants in Kenya and Tanzania at later time-points (S3 Fig). There also appears to be an increasing number of pfgch1 promoter amplifications in countries like Mali, Gambia and Kenya. In Kenya, Almost 75% of Kenyan samples were reported with quintuple mutants and more than 65% contain pfgch1 promoter amplifications (Fig 3). Similarly, for Tanzanian isolates, which were sourced between 2010 and 2013, 80% were found with quintuple or sextuple mutants and 65% reported promoter amplifications. In Tanzania

Links to other drug resistance loci
The patterns of genotypic chloroquine (pfcrt, pfmdr1) and artemisinin (pfkelch13) drug resistance were explored in relation to the presence of SP related pfdhfr/pfdhps mutations and pfgch1 amplifications (Table 3). Although, markers of chloroquine and artemisinin resistance are not in the folate pathway, they have been subject to directional selection, and an integrated analysis may reveal insights into the history of drug resistance in geographical regions. A high level of chloroquine resistance was observed across most of the geographical regions, except Southern Africa, where the Malawian population had almost no mutants and reversion to susceptible strains has been observed due to the early removal of chloroquine as a front-line antimalarial [1,28] (Table 3). In West and Central Africa, those parasites with pfgch1 gene amplifications (compared to promoter or no amplifications) had a lower prevalence of genotypic chloroquine resistance (Z-test P<2.2 × 10 −16 ) ( Table 3). The prevalence of genotypic chloroquine resistance in Southeast Asia is high (>80%) regardless of a pfgch1 amplification, but a higher prevalence is reported in samples with gene amplifications (compared to none; Z-test P<2.2 × 10 −16 ). As expected, artemisinin related mutations were only found in Southeast Asia, and a lower prevalence is associated with isolates with reported pfgch1 amplifications (compared to none; Z-test P<8.3 × 10 −11 ), but this analysis may be confounded by time of sampling. A temporal analysis revealed a decreasing frequency of chloroquine resistance related mutations (pfcrt K76T and I356T; pfmdr1 N86Y) in Eastern Africa (Tanzania: 2010  There were no strong signals of pfcrt resistance reversion in any of the Southeast Asian populations, but the data only spanned 2007 to 2014, and chloroquine is still accessible and used due to the continued presence of P. vivax [29].

Discussion
Monitoring of the polymorphisms in the P. falciparum genome associated with antimalarial drug resistance can be used to control and prevent malaria, support elimination strategies, and guide treatment choices. Surveillance is crucial for drugs like SP, which was replaced as a front-line treatment, and is now used prophylactically in pregnant women and children. Increasing numbers of pfdhfr and pfdhps resistance mutants due to selection by drug pressure have been observed, and increased copy number of pfgch1, which encodes the first and the rate-limiting enzyme of de novo folate biosynthesis, has been linked to SP resistance in Southeast Asia, particularly Thailand [8]. Adaptations in pfgch1 gene are thought to compensate for the lower fitness in parasites with three or more mutations in pfdhfr or increase resistant in parasites with fewer polymorphisms [27]. The function of the promoter duplication remains unknown, however, it is plausible that it performs a similar role as the gene amplification. Moreover, multiple copies of pfgch1 have been shown to have a direct association with point mutations in pfdhfr (I164L) or pfdhps (K540E) in Southeast Asia [9], which is confirmed in our work. Similarly, in Malawi and other countries in Africa which have had prolonged SP use, a pfgch1 promoter duplication has been identified in parasites with pfdhfr/pfdhps mutations, distinct from the whole gene amplification found in Southeast Asia.
Our analysis in 4,134 P. falciparum across 29 malaria endemic countries reveals a more fine-scaled picture of pfgch1 gene and promoter amplifications, reporting 12 different structural changes occurring in African or Asian populations. Using 18 strains with long read sequence data allowed us to robustly determine the breakpoints, including in regions of long mono-nucleotide (A or T) or AT/TA di-nucleotide repeats. The prevalence of amplifications was correlated with geography, with gene-and promoter-based types dominant in Southeast Asia and Africa, respectively. The divergence between continents and multiple types indicate an independent emergence of structural changes. Furthermore, the location of breakpoints in areas of mono-or di-nucleotide repeats, encourage even more distinctive variations due to an AT-rich genome of P. falciparum. Nonetheless, there were minor exceptions, including in Cameroon, DRC, Kenya and Ghana [10], where both promoter and gene amplifications were present. Further, the extent of amplification copy number appears to be linked to geography. For example, the Ghanaian (GB4) and Thailand (T996) strains had identical breakpoints, but the latter has one additional copy. Similarly, for the promoter amplification DupI, isolates from Southern and East African countries tended to have higher estimated copy number compared to the western and central regions of the continent. This regional difference is in part due to local evolution of strains exposed to SP (and other anti-folate drugs) over a longer period, resulting from differences in antimalarial drug implementation policy between countries.
Higher numbers of pfdhfr/pfdhps mutations were correlated with the presence of pfgch1 amplifications in both Africa and Asia. Whilst, the relatively low numbers of pfdhfr/pfdhps mutations in Papua New Guinea and South America were accompanied by the absence of pfgch1 amplifications. The promoter amplification (DupI) and quintuple pfdhfr/pfdhps genotype (IRNIS-ISGEAA) were linked in African populations, whilst the septuple (IRNL-S-ISGEGA) mutant was linked with gene amplifications (DupE and DupF) in Asia. The prevalence of pfgch1 gene and promoter amplifications and occurrence of pfdhfr/pfdhps mutants in Africa broadly overlaps with the duration and degree of SP usage as a first-line treatment. Malawi abandoned chloroquine early and has had the longest SP exposure with >10 years as a first-line treatment, and almost all isolates contained pfgch1 promoter amplifications, pfdhfr/pfdhps quintuple mutants and pfcrt wild-type alleles. Whereas, Kenya and Tanzania introduced SP five or more years after Malawi, and �65% isolates had promoter amplifications and a low prevalence of pfcrt resistance alleles (<40%). Further, across the seven years of data from Kenya, there was an increase in both the prevalence of pfgch1 promoter amplifications and pfdhfr/pfdhps quintuple mutants. In Southeast Asia the near fixation of chloroquine resistance alleles and higher levels of SP related pfdhfr/pfdhps mutations, were in tandem with 20% pfgch1 amplification frequency, particularly in Thailand and Myanmar. Interestingly, in Southeast Asia there was an allelic association between pfcrt I356T and pfdhfr I164L, most probably due to the high historical usage of both antimalarial drugs.
Overall, whilst there may be some limitations with the convenience nature of the P. falciparum sampling, the trend towards longer SP exposure and an increase in drug resistance polymorphisms makes it essential to monitor structural changes in the pfgch1 locus and the number of pfdhfr/pfdhps mutations as well as correlations between them. Pfgch1 gene expression may modify SP sensitivity [30], and further analysis studying the expression of different pfgch1 amplifications with relation to pfdhfr/pfdhps could provide additional insights into SP resistance. In lieu of further whole genome sequencing, drug resistance assays, and allelic manipulation of P. falciparum to understand functional mechanisms, our work has characterised new forms of pfgch1 amplification, which can be used as the basis of enhanced surveillance for SP efficacy.

Conclusion
The SP combination is still the only antimalarial drug treatment recommended by WHO for intermittent preventive treatment in vulnerable populations, because of its safety in pregnant women and infants and its long action. The selection of P. falciparum with pfgch1 amplifications may enhance the fitness of parasites with pfdhfr and pfdhps substitutions, intensifying the persistence of SP resistance, and potentially threatening the efficacy of this regimen for prevention of malaria in vulnerable groups. Our work has revealed new forms of pfgch1 amplification, which can be used for surveillance activities.