Genome-wide analysis of Schistosoma mansoni reveals limited population structure and possible praziquantel drug selection pressure within Ugandan hot-spot communities

Populations within schistosomiasis control areas, especially those in Africa, are recommended to receive regular mass drug administration (MDA) with praziquantel (PZQ) as the main strategy for controlling the disease. The impact of PZQ treatment on schistosome genetics remains poorly understood, and is limited by a lack of high-resolution genetic data on the population structure of parasites within these control areas. We generated whole-genome sequence data from 174 individual miracidia collected from both children and adults from fishing communities on islands in Lake Victoria in Uganda that had received either annual or quarterly MDA with PZQ over four years, including samples collected immediately before and four weeks after treatment. Genome variation within and between samples was characterised and we investigated genomic signatures of natural selection acting on these populations that could be due to PZQ treatment. The parasite population on these islands was more diverse than found in nearby villages on the lake shore. We saw little or no genetic differentiation between villages, or between the groups of villages with different treatment intensity, but slightly higher genetic diversity within the pre-treatment compared to post-treatment parasite populations. We identified classes of genes significantly enriched within regions of the genome with evidence of recent positive selection among post-treatment and intensively treated parasite populations. The differential selection observed in post-treatment and pre-treatment parasite populations could be linked to any reduced susceptibility of parasites to praziquantel treatment.

Introduction Schistosomiasis-also known as Bilharzia after its discoverer Theodor Bilharz [1]-is a neglected tropical disease that affects about 250 million people worldwide, most of whom live in low and middle-income countries (LMICs) [2]. To treat schistosomiasis, praziquantel (PZQ) is used for preventative chemotherapy by mass drug administration (MDA) [3] and has been used globally to treat schistosome infections since 1979 [4]. In Uganda, the ongoing use of PZQ in MDA started between 2002 and 2003 [3,5]. The objective of MDA in these settings has historically been to reduce the prevalence and intensity of infection and hence pathology; cure and elimination are not expected in the absence of additional interventions such as improving sanitation and snail control [6,7]. In the World Health Organisation 2021-2030 the goal has been set of reducing the proportion of people with high-intensity infections to < 1% and thereby to eliminate schistosomiasis as a public health problem in all countries in sub-Saharan Africa by 2030 [8]. The expectation is that this will be achieved primarily by increasing the frequency and coverage of treatment with PZQ-the sole drug commonly used for schistosomiasis MDAwhich could inadvertently increase drug selection pressure on parasite populations.
There is a growing body of evidence that MDA programmes may affect how parasite populations respond to treatment, for example, through reduced efficacy of PZQ in lowering egg output in communities that have received multiple rounds of PZQ MDA [9,10], but there is little evidence that this is a widespread phenomenon [11,12]. Reduced genetic diversity of parasite populations has also been associated with reduced susceptibility of the parasites to PZQ [13], with reports from Senegal having earlier linked such outcomes to potential drug resistance [14]. The development of drug resistance in natural populations would be a major health concern. Furthermore, in vitro studies have shown that resistance to PZQ can be selected for in S. mansoni [13,[15][16][17]. There is growing interest in understanding the impact of continued PZQ monotherapy on the parasite genome in order to detect the potential development of resistance to this drug as early as possible [18,19], and understand the mechanism(s) of resistance. One clue to resistance can come through understanding the mode of action of a drug.
The activity of PZQ has not been clearly understood, but recent findings suggest that the drug activates a schistosome Ca 2+ -permeable transient receptor potential (TRP) channel (Sm. TRPMPZQ) [20], hence making it the primary target for PZQ action on schistosomes. Recently, a genetic cross involving a schistosome line experimentally selected for PZQ resistance identified this TRP channel as likely responsible [21], but it is not yet clear that this locus is involved in variation in PZQ efficacy in the field. Other candidate genes have been proposed, for example the S. mansoni P-glycoprotein (smdr2), which shows increased protein expression in male worms following exposure to sub-lethal doses of PZQ [16]. Susceptibility of the parasites to PZQ might involve multiple interactions between the drug, the parasite, and the respective host.
Collecting high-resolution genetic data from parasite populations under drug selection pressure may lead to new insights into the mode of action of PZQ or the mechanism of potential resistance to the drug. Furthermore, population genetic data from parasite populations will also give insights into the population biology of the parasite. This is vital for understanding schistosomiasis epidemiology, transmission, disease severity and why certain communities might respond better to treatment than others, especially within regions where drug selection pressure is being applied [22]. While lower-resolution markers have been extensively used (e.g. [23,24]), much of our detailed understanding of schistosome population genetics has come from studies using microsatellite markers to describe the genetic structure of populations of S. mansoni [25][26][27] and other schistosome species [28,29]. This work has revealed genetic differentiation between parasite populations that are geographically separated (e.g [30][31][32][33]), but panmictic populations and very high within-host diversity within disease foci (e.g. [34,35]). The population genetics of African schistosomes has recently been reviewed [36]. Microsatellite markers have also been employed to investigate both basic questions about parasite biology (e.g. [37]) as well as more applied, operational questions about schistosome control [22]. In particular, a few studies have shown changes in genetic diversity of schistosomes with praziquantel MDA [4,38,39], but other studies have failed to find this effect [40] particularly with longer-term follow-up [41] suggesting any genetic response to treatment may be only temporary [42].
With their high levels of polymorphism, microsatellite loci are powerful molecular markers, but inevitably represent only a small proportion of the parasite genome. There is an increasing amount of genome-scale data available for schistosome populations. A number of studies have used exome capture [43] to describe introgression between Schistosoma species [44] and to study the historical demography of schistosomiasis in the Americas [45]. Restriction site-associated sequencing (ddRAD-seq) has been used to demonstrate strong genetic structure in remaining endemic hot-spots of S. japonicum transmission [46,47]. While providing high-resolution data in a cost-effective way, these reduced-representation sequencing approaches have some drawbacks, for example in identifying small haplotype blocks from ancient introgression [44]. Whole-genome data gives a more comprehensive picture of genetic variation, including non-coding variation, and so has the potential to provide more insights into understanding the population genetics of this species. While reference genome assemblies are available for a number of schistosome species [48-54], large-scale genome-wide variation data is only available from one S. mansoni population [55], with a number of other populations and other species most being represented by relatively few specimens [53, [56][57][58]. Efforts in elucidating the parasite population genetic structure have proven very helpful in understanding drug resistance or transmission mechanisms in other parasite species: most notably in the malaria parasite Plasmodium falciparum [59,60] for which very extensive genome data is available [61].
Within Uganda, the Lake Victoria Island Intervention Study on Worms and Allergy-related diseases (LaVIISWA) was a cluster-randomised clinical trial [62] examining the impact of intensive (quarterly) versus standard (annual) PZQ treatment. While the study was primarily designed to assess the impact of anthelmintic treatments on allergy-related outcomes, prevalence and intensity of S. mansoni was a secondary outcome with results suggesting a plateauing of infection, after an initial decline in intensively treated villages [63]. To assess this outcome, a pilot study in the fourth year of the LaVIISWA trial investigated cure rate and Egg Reduction Rate (ERR) [10]. A lower cure rate and ERR was seen among people receiving quarterly (intensive) treatment (n = 61; cure rate 50.8%, 95% confidence interval (CI): 37.7% to 63.9%; ERR 80.6%, 95% CI: 43.8% to 93.7%) than in those receiving a single annual standard dose (n = 49, cure rate 65.3%, 95% CI: 50.4% to 78.3%; ERR 93.7%, 95% CI: 84.9% to 97.7%) [10]. The WHO recommends an ERR of 90% for effective PZQ treatment [9,64]. While the sample size available precluded finding compelling statistical evidence, these results are suggestive of the first signs of reduced efficacy of PZQ treatment in the more intensively treated population, and that the plateau in reduction of infection during the intervention study could be due to PZQ resistance. These islands thus represent a 'hot spot' in which high baseline prevalence [62] of schistosomiasis has persisted despite multiple years of treatment [10,65].
Here, we sought to establish genome-wide data on the population genetics of parasites present in this study population, with the ultimate goal of assessing the effects of MDA on parasite genome evolution. We take advantage of the opportunity to investigate these in the context of a randomised intervention trial within a defined geographical area, allowing us to compare the effects of geographical isolation and treatment intensity on genetic variation in this population. By comparing samples taken immediately before and after a treatment round at the end of the LaVIISWA study in the two treatment arms and for multiple villages (the level of randomisation in the study), we can investigate whether the genetic impact of a single treatment dose varies with history of drug exposure. Building on the evidence that there may be differences in treatment efficacy between treatment arms, we investigated whether the signatures of natural selection across the genome differ with previous drug exposure. We also compare these data with recently published genomic data from other Ugandan S. mansoni populations.

Ethics statement
This work was not expected to result in any harm to participants. Ethical approval was given by the Uganda Virus Research Institute (reference number GC127), the Uganda National Council for Science and Technology (reference number HS 1183) and the London School of Hygiene & Tropical Medicine (reference number 6187). As previously detailed [62], written informed consent was received from all adults and emancipated minors and from parents or guardians for children; additional assent was obtained from children aged �8 years.

Sample selection and study site
Participants were selected from four villages each from the standard and intensive treatment arms from among the 27 study villages of Lake Victoria Island Intervention Study on Worms and Allergy-related diseases (LaVIISWA) trial [62,63] at the end of its fourth year. The participants involved children and adults as previously described [62]. The LaVIISWA trail was registered as ISRCTN47196031 (https://trialsearch.who.int/?TrialID=ISRCTN47196031), and the results of the trial have already been reported elsewhere [63,66]. The villages in the standard arm received PZQ once a year while those in the intensive arm received PZQ four times a year during the LaVIISWA trial period. The standard villages sampled were Kakeeka, Kachanga, Zingoola and Lugumba. The intensive villages were Busi, Kitosi, Kisu and Katooke (Fig 1). The map in Fig 1 was prepared using the R packages tmap v.3.3 [67] and sf v 1.0 [68] using administrative boundaries from OCHA Eastern region from https://data.humdata.org/ and public domain data on roads (v5.0.0), rivers (v5.0.0), lakes (v5.0.0) and urban areas (v4.0.0) from https://www.naturalearthdata.com.
Sample selection and collection was carried out as previously described in the parasitological survey [10]. The stool samples (collected from participants who tested positive for urine CCA) were processed for two Kato Katz slides as previously described [10] and miracidia hatching provided suitable material for DNA extraction. Participants were then treated, under observation, with a single dose of PZQ at 40 mg/kg (estimated by height pole), in accordance with the trial MDA procedures. Individuals whose pre-treatment sample tested positive for schistosome eggs by Kato Katz were followed up after four weeks and both Kato Katz and miracidia hatching were repeated. Miracidia hatching was carried out from each of these participants and the resultant miracidia were stored on Whatman FTA cards until DNA was extracted.

Miracidia hatching
Miracidia hatching was carried out following previously described protocols [31]. In brief, the stool sample was homogenised through a metal sieve, then further washed and filtered using a Pitchford funnel assembly [69] consisting of a 40 μm sieve placed inside a 200 μm outer sieve. Stool samples were washed using deionised water (Rwenzori Bottling Company, Uganda). The concentrated S. mansoni eggs were transferred to a Petri dish in clean water and exposed to indirect sunlight to induce the hatching of miracidia. Hatching was performed in natural light (environmental conditions) with intervals of exposure to sunlight and cover depending on weather conditions. The time taken for miracidia to emerge varied between samples, so the Petri dishes were intermittently checked for the presence of miracidia for a maximum of 48 hours. Miracidia were picked in 1.5-5μl of water and then transferred to a second dish of deionised water to dilute bacterial contamination before being placed on Whatman indicating FTA cards (Qiagen) and left to dry. The FTA cards were wrapped in aluminium foil to keep them away from continued direct light and placed in ziplock bags with silica gel in a cardboard drawer.

Whatman FTA DNA Extraction
DNA was extracted using a modified CGP buffer protocol as previously described [70,71]. The individual spots containing miracidia were punched from the FTA cards using a 2 mm Harris micro-punch and placed in 96-well plates. Protease buffer was prepared using Tris-HCl pH 8.0 (30 mM), Tween 20 (0.5%), IGEPAL CA-630 (0.5%), protease (1.25 μg/ml; Qiagen cat #19155) and water. Digestion was done by adding 32 μl of the protease buffer to each of the wells on the 96-well plate containing the punched spots from the FTA cards. The plate was vortexed to mix and spun down before incubation at 50˚C for 60 min, 75˚C for 30 min. Miracidia lysates containing DNA were transferred to a new labelled plate and stored at 4˚C until used.

Library preparation and sequencing
DNA sequencing libraries were prepared using a protocol designed for library preparation of Laser Capture Microdissected Biopsy (LCMB) samples using the Ultra II FS enzyme (New England Biolabs) for DNA fragmentation as previously described [71]. The LCMB library preparation method is optimised for uniform, low-input samples. A total of 12 cycles of PCR were used to amplify libraries and to add a unique 8-base index sequence for sample multiplexing. The LCMB library preparation protocol is optimised for uniform, low input samples. A total of 174 samples from 96 individuals in 8 villages were sequenced on two NovaSeq lanes, 108 on one lane and 66 on another lane. These 174 samples were chosen as having more than 10% of reads mapping to S. mansoni based on preliminary low-coverage genome sequencing of all 214 samples collected in the field.

Mapping and SNP calling
The reads were mapped to the S. mansoni reference genome v7 (GCA_000237925.3) [54] using the BWA-MEM algorithm in Burrows-Wheel Aligner software (BWA) (VN:0.7.15-r1140) to produce SAM files which were then converted to BAM format using Samtools v1.14. Scaffolds labelled as haplotype variants in the assembly were removed before read mapping to improve mapping accuracy, using the approach previously described [55]; see [55] also for a full list of removed scaffolds. PCR duplicate reads were identified using Picard v1.92 [72] and flagged as duplicates in the BAM file.

Identification of population structure
The three islands on which the population structure was assessed were Koome, Damba and Lugumba in the Mukono district of Uganda. An outgroup made up of inland and shoreline samples was also included, consisting of 27 samples collected in a previous study [9] and for which whole-genome sequence data were recently published [55] from Tororo and Mayuge districts in Eastern Uganda. Tororo and Mayuge are approximately 120 km apart. Mayuge district is a shoreline district located about 100 km from Mukono district. Both districts are located in south eastern Uganda, with Tororo being the inland district (Fig 1).

Test for genetic differentiation
The fixation index (F ST ) statistic was calculated between each of the villages across the different islands and treatment groups (standard, intensive, pre-treatment and post-treatment) to measure population differentiation due to genetic structure. The F ST was calculated using vcftools (version 0.1.15) [55] on the vcf file containing biallelic filtered SNPs. Mean F ST was calculated from genome-wide weighted F ST values with 99% symmetric bootstrap confidence intervals calculated using R version 3.5.1 (2018-07-02). We fitted a gravity model as logðN m Þ ¼ logðgÞ þ a:logðP i :P j Þ À y:logðD ij Þ þ ε; where N m is an estimated number of migrants per generation, calculated from the F ST between villages as: And D ij is the linear distance between the villages. In (P i .P j ), P i and P j represent the population sizes of the two villages compared. Models were fitted using R version 4.02, with the MuMIn package v1.43 to assess model importance.

Nucleotide diversity
Nucleotide diversity (pi, π) was computed from high-confidence bi-allelic filtered SNPs using vcftools 0.1.15 [73]. The genome-wide nucleotide diversity was calculated from a list of positions for each of the time points (pre-and post) and treatment groups (standard and intensive) using the option in vcftools '-site-pi' respectively. The average nucleotide diversity within each of the groups was calculated individually and the symmetric 99% bootstrap confidence intervals of the averages were estimated using R version 3.5.1. Statistical significance of differences between group means was assessed by whether the confidence interval for one mean was disjoint from the mean of other groups. Effective population size (N e ) was estimated from nucleotide diversity using the relationship π = 4.N e .μ [74] with the mutation rate 8.1 × 10 −9 [57].

Determination of rare allele sharing and kinship analysis
To identify the pairwise rare allele sharing we used a Perl script from Shortt et al.
[47] available at https://github.com/PollockLaboratory/Schisto. We filtered for minor allele frequency � 0.1 and sampled 500 SNP sites in 30 different generations. We then computed the mean value from the 30 generations for each pair. Allele-sharing scores were visualised in R version 4.0.2 using igraph v1.2.6 [75]. Significance of differences in mean allele sharing between groups were calculated against a non-parametric null distribution for each comparison generated by randomly permuting group labels 1000 times and calculating differences in mean allele sharing for each permutation. We also inferred relationships between miracidia samples using Sequoia version 2.3.5 [76] in R 4.0.2, using the GetMaybeRel command to identify related pairs with both the default genotyping error rate of 1x10 -4 and a higher error rate of 0.05. Sequoia uses a computationally intensive likelihood procedure that limits the number of SNP variants that can be analysed, so we used a heavily thinned dataset of 3,829 variants such that all variants had allele frequencies of at least 0.1, were called in 80% or more of the samples and were at least 100,000 bp apart.

Test for selection
To test for recent positive selection within the treatment arms and between pre-treatment and post-treatment, the cross-population extended haplotype homozygosity (XP-EHH) test [77] was performed. XP-EHH is designed to detect whether either an ancestral or derived allele is undergoing selection within a given population. The XP-EHH test has the power to detect weaker signals of selection as it compares two closely related populations giving a directional score. The XP-EHH detects selective sweeps in which the selected allele has approached fixation in one population but remained polymorphic in another population. A VCF file containing only bi-allelic SNPs was subset into respective chromosomes. A genomic linkage map for each of the chromosomes was computed for each individual chromosome using the adjusted map length in centimorgan (cM) for the respective chromosomes

Estimate of per-individual egg reduction rate (ERR) and association test
The posterior distribution for the ERR based on data from each individual for whom both preand post-treatment egg count data were available was estimated using a generalised linear mixed-effect model [82], incorporating nested random effects for treatment arm, village and individual. Means of the marginal posterior distribution per individual were used as quantitative phenotypes for an association study, in which a single miracidium was chosen at random for each individual for which an ERR was calculatred (N = 44) testing all 6,932,161 SNP sites that varied in this subset. The model used was a linear regression of each SNP genotype against mean ERR, using 20 principal components as covariates to control for population structure, calculated using the '-linear' and '-pca' flags in plink v1. 9 [83]. Code for these analyses is available at https://github.com/jacotton/LaVIISWA_genomes.

Population stratification
After filtering, 6,967,554 high-confidence SNPs were retained (out of 18,716,072 unfiltered SNPs) from 174 individual miracidia. Sample metadata and genetic variation data is available to view and analyse in an interactive web application [84] at https://panoptes.schisto.watch/ index.html. Principal components analysis of these high confidence SNPs showed little genetic structure within the island parasite populations on the first four principal components, which together represent 49% of the genetic variation. In particular, we found no evidence of population stratification between the standard (annual treatment) and intensive arms (quarterly treatment) or between pre-and post-treatment samples from these fishing communities (S2 Fig). The shoreline samples (from Mayuge district) clustered more closely with the island (Mukono district) parasite populations as compared to the inland samples (from Tororo district) (Fig 2), but inland parasites were distinct from most island samples on principal component 3. The large-scale geographical pattern reflects the known genetic differentiation between inland and shoreline populations [55]. We also found that the island population is strikingly more diverse than either of the other populations (Fig 2). While this is partly due to the larger number of samples included here, a larger sample of the shoreline and inland populations studied elsewhere also did not appear as diverse as the island population [55]. A number of miracidia appeared quite distinct from the main cluster of individuals on principal component 2. These divergent parasites were mostly (8 out of 9) from the islands and came from four different villages (Busi, Kakeeka, Zingoola, Katooke), with one from a shoreline village (Bwondha; S4 Fig). Although participants were all resident in the villages throughout the LaVIISWA trial for at least 3 years before this study, there is a great deal of migration to the islands from other parts of the shoreline of Lake Victoria, including Kenya and Tanzania; therefore, we suspect these miracidia represent parasites imported from other populations that we have not sampled here.

Rare allele sharing and kinship analysis
To investigate direct relatedness between individual parasites, we adopted an approach based on determining the level of sharing of rare alleles (defined by their population frequency being less than or equal to 10%) between samples [47]. This approach has recently been used to study S. japonicum populations in China with whole-genome data [58]. By definition, most unrelated individuals share very few rare alleles; here we found slightly higher average proportion of rare-allele sharing between pairs of miracidia isolated from the same individuals (0.1028) than in other comparisons (from the same village 0.0874, between villages on the same island 0.0862, between islands 0.0856). Differences between average allele sharing proportions were significant for comparing infrapopulation and within village groups (observed difference 0.0154, p-value from permutation test p<0.001) and within-village to within-island PLOS NEGLECTED TROPICAL DISEASES groups (observed difference 0.0012, p~0.002) but only marginally so for within-island and between-island comparisons (observed difference 0.0006, p~0.011). These data suggest an increase in relatedness within populations and possibly some geographical signature of increased relatedness.
We found three miracidia collected from the same infected individual at the same time with rare allele sharing of at least 0.3, and used these to calculate the average allele sharing for firstdegree relatives (full siblings or parent-offspring) of 0.403 (actual values 0.4, 0.4014 and 0.409). This value is slightly lower than the theoretically expected level of identity-by-descent of 0.5, but with only three observations it is not possible to exclude that this difference is due to chance. Similarly, the average rare allele sharing for pairs of miracidia from different islands was 0.086, which represents our best estimate for the level of sharing in unrelated individuals. The small number of miracidia available for putative first-degree relatives means the observed variance from our data is very small, and so our final classification is thus deterministic. We classified miracidia pairs sharing more than 0.3105 of these rare alleles as first-degree relatives, 0.1981-0.3104 as second-degree relatives, 0.1419-0.1980 as third-degree relatives, 0.1138-0.1418 as fourth-degree relatives and 0.0956-0.1138 as fifth-degree relative, while those with less than 0.0956 sharing were classified as unrelated.
While 24% of pairs of samples from the same individual were classified as being related, only 10% of other comparisons appeared related (Fig 3A and 3B; χ 2 = 21.785, 1 df, p = 3.05 x 10 −6 ), and a similar pattern held for close relatives (Fig 3A and 3B; first and second degree relatives represented 4% of within-infrapopulation comparisons, but 0.05% of all comparisons; χ 2 = 183.37, 1 df, p < 2.2 x 10 −6 ). There was no significant enrichment in related pairs of miracidia with either treatment intensity or for samples collected pre-and post-treatment. We found five pairs of first-degree relatives in total ( Fig 3C); but one pair were from different islands (marked 1 on Fig 3C) and a second pair were from different individuals sampled on consecutive days in Kakeeka village (marked 2 on Fig 3C). On the face of it, this would imply that the same combination of clonal cercariae infected these people, which seems very unlikely-particularly for the geographically separated cases. We cannot exclude the possibility that either the high level of rare allele sharing is misleading in these cases, or errors in sample identification. The remaining three pairs of first-degree relatives were pairs of miracidia sampled from single individuals in Busi, Kitosi and Kisu villages on consecutive days. Interestingly, a miracidium sampled from the same individual in Kisu was a second-degree relative of the first-degree pair, but this was collected post-treatment 37 days later. This is one of only 8 pairs of second-degree relatives. This suggested that either an adult worm survived treatment but changed 'partners' (to produce a half-sibling or avuncular relationship) during this period, or two clonal worms with genetically distinct partners were present in this host at the two timepoints and produced these miracidia.
We tested the accuracy of these inferred relationship using Sequoia, an independent likelihood-based method [76] that will identify first and second-degree relatedness, using a thinned subset of 3829 variants. With the standard genotyping error rate (1x10 -4 ) this approach identified only a small number (3) related miracidia, but with a more liberal error rate of 5% this approach identifies 30 pairs of related miracidia, and shows very good agreement with the inferred relatedness using the allele-sharing approach above. Sequoia identified all 5 firstdegree relationships as full sibs, and 6 out of 8 second-degree relationships identified by allele sharing were also inferred to be second-degree relatives by Sequoia. A relatively small fraction of third-degree (15 our of 101) and fourth-degree (4 out of 427) relationship assignments from allele sharing were also assigned as second-degree relationships by Sequoia.

Analysis of genetic differentiation between villages
To further investigate genetic structure within the island population, we calculated F ST (the proportion of genetic variation explained by population structure) for each pair of villages. As we expected, F ST between villages was very low (maximum 0.0067), indicating little or no geographic structure to our data. We observed higher genetic differentiation between villages on different islands compared to those within the same island, but the small number of pairwise comparisons (N = 8 villages, 28 pairwise comparisons) meant that we did not have sufficient statistical power to detect any difference (p = 0.082, 1-way ANOVA of between/within village vs F ST ). The villages were between 1 and 13 km apart, but there was no significant relationship between the distance between villages and F ST (Fig 4). To explore the geographical structure in these data more fully, we also fitted a gravity model attempting to explain F ST between each pair of villages by the distance between villages, the population of each village and a factor capturing the effect of being on the same island. In this model, none of the explanatory variables had a significant influence on F ST , but the location of villages on the same island vs different islands was the most important variable with a likelihood weight in the best-fitting models of 0.48, while 0.31 for linear distance between villages and 0.23 for the product of village populations.

Within-population genetic diversity
When comparing all pre-and post-treatment samples, we observed a very small but significant difference (p < 0.01) in genetic diversity between samples taken before and after treatment, with the 99% confidence interval for the mean nucleotide diversity in pre-treatment samples not overlapping with the mean post-treatment nucleotide diversity. This is consistent with a

PLOS NEGLECTED TROPICAL DISEASES
small effect of a single PZQ treatment round on the parasite population (Table 1). There was also lower diversity in parasites collected from villages in the intensive arm of the study than in the standard arm (Table 1), possibly reflecting a longer-term effect of more frequent PZQ treatment in these locations, despite the high levels of gene flow apparent between these locations implied by the very small levels of genetic differentiation we report. While this trend was consistent in both pre-and post-treatment samples, the difference between trial arms was most pronounced in post-treatment populations (Table 1). These diversity values are very similar to those observed in a recent study of the parasite populations on the lake shore and inland sites [55]. Using the mutation rate estimated previously [57], this implies an effective population size of around 10 5 individuals from this sample collection, just outside the upper confidence limit of the estimate for the East Africa population in the previous study (3.67-9.35 x 10 4 ) [57], and much higher than estimates from individual schools on the Lake Victoria shoreline (3.30-3.69 x 10 4 ) [55], highlighting the diversity of S. mansoni parasites present on the islands.

Analysis of signatures of selection
We used the XP-EHH test to identify genomic regions under differing selection pressures in separate comparisons between standard and intensive treatment arms ( Fig 6A) and between pre-and post-treatment samples (Fig 6B). Taking extreme XP-EHH scores of < -2 or > 2 as a cutoff, we identified 510 windows as outliers including 12.75 Mb or 3.1% of the genome in total and representing 123 contiguous regions. None of the windows from either comparison overlap the peak of differentiation between standard and intensive treatment populations on chromosome 5. We note that the Z chromosome was particularly enriched for windows with extreme XP-EHH scores, containing almost half of those found genome-wide (5.325 Mb). This could be a technical artefact caused by difficulty in mapping to a highly repetitive chromosome [54], or due to the smaller average population size or a stronger effect of selection on recessive alleles when hemizygous. There are also a number of reasons to expect sex-linked genes to frequently be under selection [85]. An increased variance in XP-EHH scores is apparent specifically in the Z-specific region (Fig 6) of the assembly scaffold representing the Z chromosome [54]. This region is not more repetitive than the autosomes or the pseudo-autosomal region shared by Z and W (see Table S12 of [54]), but is at lower copy number in the population as it is present in a single copy in female worms, so we suspect this enrichment of extreme XP-EHH scores represents a population genetic effect rather than a technical artefact. Some areas of the Z-specific region (6 contiguous windows, total length 300kb) show extreme scores in both comparisons between treatment arms and pre-and post-treatment samples.
There were 107 genes overlapping the outlier windows in the post-treatment samples, which were enriched for genes associated with seventeen GO terms for molecular function and biological processes (S3 Table; https://biit.cs.ut.ee/gplink/l/wWE3Rp-ASq). Only 53 of these genes were on autosomes (leaving 54 on the Z and/or W chromosomes), and no GO terms were enriched when considering just the autosomal gene subset. No statistically significant enrichment for any functional category was observed among the genes undergoing stronger selection in pre-treatment individuals. Functional profiling showed that the 132 genes (78 autosomal) under stronger selection in the intensive arm were significantly enriched for association with 10 GO terms (S3 Table; https://biit.cs.ut.ee/gplink/l/1VaAMWpxQK), which remained enriched in the autosomal subset. 88 genes were found in 46 autosomal windows with extreme XP-EHH values suggestive of stronger selection in the standard treatment arm include a pair of adjacent closely related genes likely to be a recent tandem duplication and possessing nucleoside deaminase activity on chromosome 4; these genes represent the only significantly enriched GO terms in this comparison (S3 Table; https://biit.cs.ut.ee/gplink/l/ 4Fz7ZA3hTC).

Individual egg reduction rate phenotypes
In an attempt to identify a phenotype for drug efficacy, we estimated the egg reduction rate (ERR) for 88 individuals for which genomic data was available and that had Kato-Katz egg counts taken both before and after treatment using a Bayesian linear mixed-effect model [82] that has previously been used to assess praziquantel efficacy [9,86]. Previous analysis revealed a lower but not significant ERR in the intensive arm than the standard arm [10]. Similarly, we observe lower marginal ERR in samples collected in intensive than standard treatment villages, but with largely overlapping posterior distributions (Fig 7B), and while villages vary in ERR PLOS NEGLECTED TROPICAL DISEASES (Fig 7C), there were similar numbers of high-and low-clearance villages in the two arms. These differences were largely driven by a small number of individuals in some villages with very low (even negative-implying a higher egg count after treatment than before) ERR values (Fig 7A). Unlike in a previous study [9], no ERRs were significantly below the 90% threshold,

PLOS NEGLECTED TROPICAL DISEASES
probably because only duplicate counts were available before and after treatment here, so there was significantly less information to estimate ERR on a per-individual basis.
Despite the small sample size meaning our study is likely to be under-powered to detect any but the most highly penetrant high-frequency alleles, we attempted to identify genetic variants associated with differences in ERR, testing the 6.95 million high-quality SNPs found on the 7 autosomes or on the shared ZW scaffold. The smallest p-value for any SNP was 1.393x10 -9 , which after adjustment for multiple testing represents an adjusted p-value of 0.01791 (S3A Fig). However, there was some evidence that p-values are systematically biased in this analysis (S3B Fig). Correcting for population structure based on PCA coordinates removed the significance of hits (lowest p-value = 4.3x10 -8 ; adjusted p-value = 0.274). The most significant hit (SM_V7_7:1552004) is in an intron of Phosphatidylinositol 4-kinase (Smp_130340) and has no obvious link to praziquantel mode-of-action. We thus conclude that there is no strong evidence linking any individual genetic variant in these data to variation in estimated ERR.

Discussion
Schistosomiasis is second only to malaria in socio-economic impact among parasitic causes of morbidity and mortality [87][88][89]. MDA is the main method for schistosomiasis control, and there is currently an effort to expand the coverage of community-wide drug treatment to improve morbidity control [90] and address the persistence of schistosomiasis in some areas despite many years of PZQ distribution [91]. Understanding whether intensive treatment for individuals living in high transmission communities has an impact on parasite populations, potentially leading to drug resistance is of high importance for public health among schistosomiasis endemic communities in Africa. Determining the genetic basis of any drug resistance that does emerge is also crucial for tracking the spread of resistance through schistosome populations and for future drug or vaccine development designed to circumvent resistance as has been demonstrated for oxamniquine resistance [92].
Here, we have taken advantage of a large-scale trial in which the entire communities of 26 fishing villages were regularly treated with PZQ. A number of features of the study made this an ideal place to detect an effect of PZQ treatment on parasite populations. Villages were assigned randomly to treatment arms, so treatment frequency was independent of morbidity, parasite prevalence or intensity. Treatment was given under direct observation, avoiding issues with drug compliance reported in other studies [5]. The four week follow-up interval posttreatment would minimise the possibility of diagnosing newly acquired infection after treatment based on the development time of S. mansoni [93], as a new infection would take longer than four weeks to result in egg production that could be detected by Kato Katz and microscopy [94]. The exception would be if, during the time of treatment, a patient had juvenile worms as these would not have been cleared by treatment [95]. We expected that as the study was based on a group of islands it might help isolate the parasite population and so allow us to detect drug-induced selection in this population without the confounding effect of high levels of gene flow from untreated populations. Only individuals who had lived in these villages for at least three years were included in the study to control for absenteeism and MDA compliance although it was still not possible to control for movement between villages and islands given that fishing is the main economic activity within these communities.
The population of parasites present on the islands is closely related to that recently described from communities on the shoreline of Lake Victoria, and as expected rather divergent from that inland from the lake (Fig 2A and 2B) [55]. This presumably reflects greater movement of people between the shoreline and island than with the inland populations, as well as that the inland population included here is further (approximately 160 km) from the shoreline than are the islands (approximately 80 km). We see little genetic differentiation between villages on the same island, as fishing villages are close to one another (1-13 km apart) and movement may be frequent among fishermen and village communities. Less expected was that we see little or no genetic differentiation between islands, with only a weak trend for greater genetic differentiation between villages on different islands than villages on the same island, albeit this is a larger effect than either the distance between villages or the size of village populations, perhaps suggesting that snail vector movement around the coasts of islands may play a role in parasite movement. The islands are separated by water that is deep enough [96] (primary data at http://dataverse.harvard.edu/dataverse/LakeVicFish) to prevent snails moving actively from one island to the next, but parasites could travel through movement of infected people or through infected snails being carried on fishermen's or conceivably by rafting on floating plants such as water hyacinth. Geographical conditions on these islands are similar except for Lugumba Island which has more rocky/stony shores compared to Koome and Damba which have more sandy shores and more vegetation, so we would expect snails to be able to establish similarly at most locations.
Despite seeing little or no genetic structure in the island parasite population, we see some evidence that PZQ treatment has had a small effect on the genetic diversity of the parasite population in this area. While we do not have baseline samples from before any PZQ treatment was administered as part of the LaVIISWA trial, we see very slightly higher genome-wide genetic diversity in the standard treatment arm than in the intensive arm, as would be expected if intensive treatment has been more effective at reducing the parasite population than the standard treatment regimen [10], although the effect we observe is very small and so maybe of limited biological relevance. Differences in the same direction were present when comparing subsets of samples taken before and after treatment separately, and was more pronounced in the post-treatment populations. As we see only very few closely related parasites, and no significant enrichment in relatedness based on treatment arm or sampling time with respect to treatment, it seems that this effect is unlikely to be due to differences in the number of directly related miracidia. We observe little or no genetic differentiation between villages in the two study arms, and only very slightly higher differentiation between the arms in post-treatment than in pre-treatment samples.
Evidence that PZQ treatment has some effect on the parasite population led us to investigate whether particular variants might be related to exposure to PZQ and so potentially responsible for any reduced susceptibility of parasites to PZQ within MDA programs [9]. We identify several regions within the genome that were highly differentiated between samples from the standard and intensive arms of the study, including a particularly striking region on chromosome 5 that showed high differentiation between post-treatment samples from the two arms of the study. This region contained a number of genes with functions that could be potentially linked to PZQ drug action. These include an ATP-binding cassette (ABC) transporter-associated gene (Smp_136310) that has previously been linked to helminth detoxification and drug resistance processes [97]. A gene with calcium dependent/modulatory functions (Smp_347070) was also found in the enriched region on chromosome 5, which is of interest given that the mode of action of PZQ has long been linked to increased permeability of the cell membrane to calcium ions into the cells which then causes contraction, paralysis and eventual death of the worms [98]. We also investigated regions of the genome under different selective regimes either with treatment intensity or when comparing pre-and post-treatment samples. Among the genes under varying selection were purine-nucleoside phosphorylase activity associated genes (Smp_197110 and Smp_171620) which are involved in the nucleotide salvage pathway of S. mansoni. Given that S. mansoni depends entirely on the salvage pathway for its purine metabolism [99], there is a possibility that ongoing non-random selection within this gene might affect parasite metabolic processes and a potential future drug target. However, we note that many biological processes could be contributing to genetic variation between samples from natural populations apart from variation in drug susceptibility [100]. While this study has shed some light on possible drug resistance genetic markers, other approaches, such as genetic crosses between parasites [101,102] from natural populations that vary in drug efficacy or from lines selected for resistance [21,92], are likely to have more power to reveal the genetics of drug resistance and so enable more focused studies of the effect on treatment on parasite populations.
A limitation in this study was that we did not have parasite populations sampled several years apart since it has been observed in similar studies that differentiation occurs over time in a given community [38], so sampling over a longer time-span could provide stronger evidence of genetic change in the population. In particular, we would ideally have access to baseline samples from the same population taken prior to any large-scale PZQ treatment being administered. Despite the falling cost and rising throughput of nucleic acid sequencing, we were limited in the number of miracidia that we could sequence in this study. An additional limitation is the labour-intensive process of hatching and washing miracidia necessary to obtain highquality data due to the non-selective nature of the whole-genome sequencing approach [70].
As control programmes expand and reduce pathogen populations, we would expect the genetic diversity of these populations to fall to reflect the reduced population size [13,41,103], and drug resistance to be reflected in particular genotypes being over-represented in samples collected after large-scale treatment has been applied. As in other recent studies [42, 55], we find evidence of at best a very limited effect of PZQ treatment on schistosome populations either post-treatment or over a longer time frame of intensive treatment with a large diverse parasite population and only minimal population structure. In most previous studies, extensive refugia from treatment have been present in the community, as only school-age children are routinely treated in most areas, so it is instructive that we find similar results in this study despite community-wide treatment. Taken together, this body of work suggests that PZQ treatment may be some way from having a dramatic effect on schistosome transmission in areas of high infection prevalence, suggesting that additional interventions may be important in reducing transmission [91, 104,105].
We see little evidence that PZQ treatment in these areas is strongly selecting for resistance. One possibility is that this study-and others-are underpowered to detect the early signs of selection, as might be expected if multiple resistance-conferring alleles are present in the population [106] or are present on many different genetic backgrounds in this large population, so that any selection for resistance is a very soft selective sweep [107] and so difficult to detect. While there is some evidence for reduced efficacy of PZQ in Uganda [9], other studies do not find this effect [11], including one study based on the same population as studied here [10]. Perhaps the most parsimonious explanation is that low efficacy, where reported, is not a sign of PZQ resistance, but due to host or other factors [91,108]. If persistent schistosomiasis hotspots are due to factors other than PZQ efficacy, this should reassure control programs that continued PZQ MDA will continue to play an important role-alongside other interventionsin reducing infection prevalence in these foci. Even in the absence of drug resistance emerging in natural populations, high-resolution genetic surveillance of African schistosome populations is ideally suited to detect changes in parasite population structure related to the impact of control measures [30], and could ultimately inform approaches to eliminate schistosome morbidity in remaining 'hot-spots' by helping us understand parasite transmission between hosts and between foci [91].
In summary, we demonstrate a small but significant effect of both short-term PZQ treatment intensity and a recent treatment episode on genome wide-diversity in a schistosome. This reduction in diversity does not appear to be associated with enrichment of closely related parasites, but rather could reflect ongoing non-random recent selection within these fishing communities in Uganda that might be under the influence of continued mass drug administration. We identify genomic windows that are either particularly differentiated following treatment or appear to be under differing selective regimes with different treatment intensity. These regions could include genes involved in drug response, but additional data is needed to prioritise candidates for further investigation.