Genome-Wide Analyses of Individual Strongyloides stercoralis (Nematoda: Rhabditoidea) Provide Insights into Population Structure and Reproductive Life Cycles

The helminth Strongyloides stercoralis, which is transmitted through soil, infects 30–100 million people worldwide. S. stercoralis reproduces sexually outside the host as well as asexually within the host, which causes a life-long infection. To understand the population structure and transmission patterns of this parasite, we re-sequenced the genomes of 33 individual S. stercoralis nematodes collected in Myanmar (prevalent region) and Japan (non-prevalent region). We utilised a method combining whole genome amplification and next-generation sequencing techniques to detect 298,202 variant positions (0.6% of the genome) compared with the reference genome. Phylogenetic analyses of SNP data revealed an unambiguous geographical separation and sub-populations that correlated with the host geographical origin, particularly for the Myanmar samples. The relatively higher heterozygosity in the genomes of the Japanese samples can possibly be explained by the independent evolution of two haplotypes of diploid genomes through asexual reproduction during the auto-infection cycle, suggesting that analysing heterozygosity is useful and necessary to infer infection history and geographical prevalence.


Introduction
The helminth Strongyloides stercoralis, which is one of the most common and globally distributed human pathogens of clinical importance, infects 30-100 million people worldwide [1,2]. This parasite most often resides in areas with tropical or subtropical climates and less frequently in areas with a temperate climate. It occurs infrequently in societies where faecal contamination of soil or water is rare, and therefore, new infections are very rare in countries with developed economies [3]. However, infection can persist for life unless effective treatment eliminates all adult parasites and migrating auto-infective larvae. Therefore, carriers are present in developed countries, representing a potential risk of horizontal transmission among humans [4]. Strongyloides stercoralis is also a natural parasite of dogs [5].
Strongyloides stercoralis is the only medically important nematode that can multiply in the host via an auto-infection cycle to reach critical levels and cause death [1,6,7]. The complex life cycle includes sexual and asexual reproduction. Infection with S. stercoralis begins when the infective third-stage larvae (iL3) in soil attach to and penetrate the human skin. After reaching the lung through the bloodstream, the parasites ascend to the trachea, and are swallowed to settle in the small intestine (their final destination) where the parasitic adults produce eggs through parthenogenesis. The larvae passed in the host faeces develop via either the homogonic route into iL3 forms or the heterogonic route into free-living adult stages that reproduce sexually outside the host. Although most eggs/larvae of the parasite are excreted from the host with faeces, homogonic larval development may occur inside the small intestine giving rise to auto-infective L3 which penetrate the intestinal wall and invade the tissues, ultimately entering the lung and returning to the small intestine to complete development to the parasitic female. In this circumstance, termed auto-infection, repeated generations of development may take place within a single host. [5]. Although strongyloidiasis is usually an indolent disease in immunocompetent hosts, it can cause a hyperinfective syndrome (disseminated strongyloidiasis) in immunocompromised hosts through the reproductive capacity of the parasite inside the host. Disseminated strongyloidiasis, if untreated, is associated with mortality rates of approximately 90% [8].
Despite its great medical importance, the threadworm S. stercoralis, is one of the most overlooked helminths [1]. The parasite's complex life cycle has long been considered a major impediment to attempts to control strongyloidiasis. Recently, the genome of S. stercoralis was sequenced and compared with other species of Strongyloides [9]. This comparative genomic study illuminates the use of genome-wide analysis to identify genes related to parasitism, to investigate diversity and population structures, and to determine the transmission route of S. stercoralis. Here, we aimed to determine the intra-species genomic variations of S. stercoralis present in Japan and Myanmar, which differ in socioeconomic status, history of infection and prevalence of this nematode.

Ethical statement
The Ethics Committees of the University of the Ryukyus and the University of Medicine-1 Yangon approved this study. Participants, who were informed of the study's aims and procedures, provided written informed consent. All individuals infected with S. stercoralis were treated with ivermectin.

Sample collection
Faecal samples were collected in 2014 (Table 1) in Okinawa, Japan, representing an area where S. stercoralis is non-prevalent and where S. stercoralis has not been endemic for at least the last 50 years [10], and Htantabin, Myanmar as a prevalent area where new infections frequently occur. In Okinawa, Japan, faecal tests were performed for inpatients in one hospital and residents of two elderly nursing homes located in the southern part of Okinawa. For Myanmar samples, a community survey was conducted in three different villages of Htantabin area. Faeces were incubated on 2% (w/v) agar plates at 25˚C for 2-4 days. This culture condition would allow a portion of parasites to undergo a complete free-living generation involving a sexual cross although worms may mate with their genetically identical siblings in the culture. Individual nematodes (iL3) that crawled out of the faeces were transferred to 0.2 ml tubes containing 10 μl of worm lysis solution (9 μl Direct PCR [Viagen], 0.5 μl of 20 mg/ml Proteinase K [Qiagen] and 0.5 μl of 1 M dithiothreitol [Wako]). The lysates were incubated at 60˚C for 1 h and then at 95˚C for 10 min. To identify nematodes, the 18S ribosomal RNA gene was amplified using 0.1 μl of worm lysate with the primers 988F and 1912R [11], and the amplicons were sequenced using an ABI 3130 sequencer (Applied Biosystems) with the BigDye Terminator v3.1 kit. Worm lysates were immediately used for further analysis or stored at −30˚C.

Whole genome amplification (WGA)
Genomic DNA was amplified from 1 μl of worm lysate using an Illustra GenomiPhi V2 kit (GE Healthcare) according to the manufacturer's protocol. Amplified products were qualitychecked using 1% agarose gel electrophoresis, purified using a QIAamp DNA Mini Kit (Qiagen) and quantified using Qubit (Life Technologies).

Illumina sequencing
Libraries were constructed using a Nextera DNA Sample Prep Kit (Illumina) with 100 ng of amplified DNA according to the manufacturer's protocol. The libraries were sequenced using an Illumina MiSeq with a v3 Reagent kit (600 cycles) according to the manufacturer's recommended protocol (https://icom.illumina.com/) to produce 300-bp paired-end reads to obtaiñ 3G base data. Non-WGA reads of the genome reference strain (SSTP) were obtained from NCBI SRA under accession number ERR066168, randomly sampled and used as a reference to evaluate WGA reads.

Principal component analysis
Principal component analysis (PCA) was performed using R (ver 3.1.1) implemented with SNPRelate package [17]. Bi-allelic SNPs were extracted from full variant information of all the samples and used for PCA analyses.

Reconstruction of mitochondrial genomes
The mitochondrial genomes of Rk4-1 nematodes were reconstructed from the Illumina reads using MITObim ver 1.6 [18]. In the first step, Illumina reads were mapped to the S. stercoralis reference sequence (Genbank accession No. NC_028624) to generate a seed for the second step. In the second step, gaps and ambiguous regions in the seed were replaced by iterative mapping that was repeated until all gaps were closed, and the number of reads remained constant. Reconstructed mitochondrial sequences were refined by correcting bases using ICORN2 [19], and the assembly was used to represent the Japanese nematode reference mitochondrial genome.

Phylogenetic reconstruction
Nucleotide sequences of SNP positions in scaffolds > 30 kb, which accounted for 96% of the total genome assembly, were extracted from the vcf files and were used to construct phylogenetic networks based on similarity/dissimilarity with the Neighbor Net method of SplitsTree4 [20]. Computational phasing of the diploid genotypic data was performed using SHAPEIT2 with its default parameters [21]. Phased sequence data from all samples were used to create a separate Maximum Likelihood tree using FastTree (ver 2.1.8) for each scaffold > 30 kb [22].
To generate a mitochondrial-based phylogeny, reads from each nematode sample were mapped to the Japanese parasite's reference sequence (see above) using SMALT v0.7.4, and SNPs were called using GATK [13]. The nucleotide sequences of the SNPs were extracted and used to generate Maximum Likelihood trees using FastTree (ver 2.1.8) [22].

Accession numbers
All sequence data were submitted to the DDBJ Sequence Read Archive (DRA) under project accession number PRJDB5112.

Whole genome amplification and re-sequencing
We re-sequenced the genomes of 33 S. stercoralis nematodes collected in Myanmar (prevalent region, nine from three patients) and Japan (non-prevalent region, 24 from six patients) [10] ( Table 1). We applied the WGA method [16] using the Illumina MiSeq to sequence the whole genome of a single nematode. We obtained 300-bp paired-end reads to > 20× coverage (> 3 Gb) for each nematode and mapped them to the S. stercoralis reference genome. The mapping ratios of each sample to the reference genome ranged from 77.46% to 96.96%, and the ratios for reads mapped in the correct orientation and distance ('proper paired' reads) ranged from 48.94% to 62.72% (S1 Table). In contrast, the mapping ratios of non-WGA reference reads were 90.95% with 71.79% proper pairs (S1 Table, Table,

Variant calls
We detected 298,202 variant positions, which accounted for 0.6% of the total genome, among the 33 samples when compared with the reference. Most variants were SNPs (231,583 positions), and small inserts or deletions (indels) were present at 67,655 positions (S2 Table). The number of variant positions in individual nematodes (including homozygous and heterozygous sites compared with the reference) ranged from 137,439-146,259 and 135,583-157,900 of the Myanmar and Japanese samples, respectively (S2 Table).
Comparisons with reference gene models revealed that 27.7% of the variants were located in intergenic regions, followed by 27.3%, 15.2%, 12.8% and 9.9% in exonic, upstream, downstream and intronic regions, respectively ( Fig 1A). There were higher frequencies of variant positions in intergenic regions compared with those of the individual nucleotides in the total genome and lower frequencies of variant positions in exonic regions (Fig 1A).
The distribution of SNPs along the four longest scaffolds is shown in S3A Fig and the   positions in 10-kb window ranging from 3 to 922 (median = 31), suggesting that they represented 'hotspots'. Further, the hotspot regions did not correspond to regions of high coverage mapping ( S2 Fig and S3 Fig) (Pearson's r = -0.01), suggesting that the variant call was not significantly influenced by WGA amplification bias. No significant differences in SNP distribution between the two countries were observed (high correlation coefficient between SNP numbers in 10-kb window of the two countries; Pearson's r = 0.78, p < 2.2e-16).

Population structure
Principal component analysis (PCA) of SNPs compared with the reference strain unambiguously separated the Japanese and Myanmar samples from the reference strain by the first PC, which account for 40.1% of the variance. Japanese and Myanmar samples were separated by the second PC (14.1% of variance) (Fig 2A). Fig 2B shows the PCA results without the reference. The Myanmar and Japanese samples were separated by PC1 (28.4%). PC2 (10.3%) grouped the Myanmar samples according to their host origins, although the separation in the Japanese samples was not unambiguous.
Pair-wise distances (π) of samples originated from different countries (Japan vs. Myanmar) were generally higher than those within populations (Fig 3). In the Myanmar samples, pairwise distances between hosts were higher compared to those within hosts, although such differences were not observed in the Japanese samples (Fig 3). Because the parasitic adult stage of Strongyloides is mitotically parthenogenetic, multiple larval progeny of such adults will be, in theory, genetically identical. Although within-host samples showed high similarity to each other (π values < 7.5e-04) both in Japan and Myanmar, they still exhibited some differences from each other. Because of possibility of errors in WGA or sequencing process and difficulty in heterozygous SNP call, it is difficult to conclude that they are genetically different or identical progeny. Simulated experiments using proved progeny of single adults will be useful to answer this question. Analysis of molecular variance (AMOVA) showed 23.7% of variance was associated with differences between populations and 6.3% with differences between hosts, whereas more than 100% of the variance was attributed to variation within samples (S3 Table). Although the negative phi-statistics and variance values observed in AMOVA (S3 Table) may reflect problems with sample size and analytical strategy, these results suggest a close relationship among the Japanese samples independent of host origin and high heterozygosity within the individual genomes.
Next, we constructed phylogenetic networks according to the SNPs, which support the PCA results (Fig 4). The tree contained two main clades, comprising Myanmar or Japanese samples. All samples in the Myanmar clade from the same host clustered together and were clearly distinct from those of other hosts. Most Japanese samples sub-clustered according to host origin, although the separations were not as clear as those of the Myanmar samples. Further, we found some Japanese samples (Rk5-6, Rk7-5, Rk8-3 and Rk8-8), which have lower coverage (S1 Table), were placed at positions distant from those of other worms of the same host origin. This is likely because of failure to call heterozygous SNP in low coverage samples [16]. We therefore removed these four samples (Rk5-6, Rk7-5, Rk8-3 and Rk8-8) and those having lower coverage than the four samples (based on % of genome regions with 15× coverage; MyHTB122-6 and Rk6-4) from further analyses. Two samples from host Rk9 (Rk9-3 and Rk9-11), which had higher coverage (S1 Table), occupied positions more distant from the other samples as well as a sample from host Rk9 (Rk9-6) ( Fig 4A).
Next, we used the computationally-phased sequence dataset for the Japanese samples to construct phylogenetic trees for each scaffold (> 100 kb). The two haplotypes in a genome, shown as A and B haplotypes in S4A Fig, separated  The mitochondrial tree exhibited a similar topology to the nuclear tree ( Fig 4B). The Japanese samples were placed into one clade, clearly separated from Myanmar samples with a high support value. Within the Japanese samples, those from host Rk9 clustered with those from host Rk8 and occupied the basal position of the other Japanese samples. As observed in the phylogenies of nuclear genomes, the samples from hosts Rk4, Rk5, Rk6 and Rk7 were closely related, but were unambiguously sub-grouped according to host origin. Samples from host Rk9 (Rk9-3, Rk9-6 and Rk9-11), which clustered separately in the nuclear tree, grouped together in the mitochondrial tree ( Fig 4B). Interestingly, mitochondrial genome sequences of worms from the same host origin were not perfectly identical (especially in worms from host Rk4) although differences were very small and this may be due to sequencing errors.

Genomic heterogeneity
Strongyloides stercoralis employs distinct modes of reproduction as follows: asexual parthenogenetic reproduction by parasitic females inside the host and sexual reproduction by free-living adults outside the host. Asexual reproduction may promote increased heterozygosity because of the absence of recombination and segregation in diploids (known as Mullers's ratchet or Meselson effect) [23,24]. We therefore compared the heterozygosities (π t ) of samples from Japan, where the parasites likely persist longer in the host through asexual auto-infection because no new infections are suggested to be unlikely to have occurred in Japan in the last 50 years [10] and our Japanese samples were collected from elderly people (Table 1), and samples from Myanmar to represent frequent new infections by larvae that arose through sexual reproduction.
Next, we compared the heterozygosities of the scaffolds assigned to autosomes and sex chromosome [9] of individual samples (Fig 5). Two main groups were observed as follows: 1. Myanmar samples with values ranging from 0.001-0.0015 in the sex and autosomal scaffolds, 2. The majority of Japanese samples had with higher heterozygosities compared with those of Myanmar samples in the sex and autosomal scaffolds. The exceptions Rk9-11, Rk9-3 and Rk8-7 were positioned separately from those shown in the plot. The autosomal heterozygosities of Rk9-3 were lower but had values similar to those of the sex chromosomes of the other Japanese samples, whereas the heterozygosities of the sex chromosomes of Rk8-7 were low and had a value consistent with that of the autosomes of the Japanese samples. The values of both the sex chromosomes and autosomes of Rk9-11 were low. In contrast, the numbers of homozygous SNP sites in these three samples (S7 Fig) were greater than other Japanese samples on the sex chromosome of Rk8-7, the autosomes of Rk9-3 and both types of chromosomes of Rk9-11 (with an increase of approximately 50% of autosomes compared with Rk9-3).
Together, these results suggest that samples Rk8-7, Rk9-3 and Rk9-11 arose through recent sexual crossing between very closely-related individuals and acquired more homozygous chromosome pairs in sex chromosomes and autosomes. These findings likely explain the positions of Rk9-3 and Rk9-11 in the network tree, which were distant from Rk9-6 (Fig 4).

Discussion
A major weakness of research on parasitic helminth genomes is the inability to obtain sufficient quantities of DNA because at present, none of these parasites can be cultured through its entire life cycle outside of a living host. Nevertheless, the WGA technique may solve this problem by producing high yields of whole genomic DNA from a single parasite [16]. Here we used the WGA technique combined with the NGS technology to re-sequence the entire genomes of individual S. stercoralis to acquire a better understanding the population structure of this medically important human pathogen. To the best of our knowledge, this study represents the first genome-wide approach to estimate the genotypic variations in S. stercoralis populations.
We show here that WGA detects variants with sensitivity comparable with those of normal variant detection methods, although WGA requires more data (coverage) to correctly call heterozygous positions, likely because of amplification bias. Here, our analysis of nematodes collected in Japan and Myanmar detected approximately 0.3 million variant positions, representing 0.6% of the genome, by comparison to the reference strain isolated from a dog in the United States. Although the reference and samples in this study were originally isolated from different hosts, this level of diversity represents as low as the diversity of C. elegans (~0.05%) [27] compared with other nematodes such as Pristionchus pacificus (~2%) [28] and Bursaphelenchus xylophilus (~4%) [29]. This may be explained by the relatively recent divergence of S. stercoralis from a common ancestor of S. stercolaris and the sister species, stronger selective pressure on the obligate parasite compared with free-living organisms such as P. pacificus, or facultative parasites such as B. xylophilus or both. Additionally, the unique mode of reproduction of this species may have affected the diversity level. S. stercoralis is distributed worldwide in areas with warm climates, and it will be interesting to analyse the diversity of S. stercoralis isolated in Africa, South America and Australia to study their global diversity. The data from such an analysis may illuminate the origin and migration routes of S. stercoralis and allow comparison of these attributes in populations of the parasite in humans and dogs as gene flow of parasites are generally determined by host movement [30].
Besides the human strongyloidiasis situations in the two countries (Japan and Myanmar), situations of Strongyloides infection in dogs are also likely to differ between the two countries. Strongyloides infection rate in dogs was reported to be as low as 0.4% in Okinawa, Japan [31]. Although we can't find any reports about Myanmar canine strongyloidiasis, infection rate in Myanmar is possibly very high as reported in other Southeast Asian countries [32,33]. Therefore, a genome-wide investigation of their population structures would be of interest to see if a similar intra-genome heterozygosity trend can be observed as in human Strongyloides and to identify if there are interspecies transmissions between dogs and humans.
The phylogenetic relationships inferred from nuclear and mitochondrial SNPs were basically similar to each other. However, the relationships of Japanese samples observed in the nuclear trees were more complicated and therefore difficult to interpret. We found this is likely not only because the Japanese samples originated from a small gene pool but is also potentially explained by independent evolution of two haplotypes of the diploid genomes through asexual reproduction. This suggests that analyses of heterozygosity (e.g. by phasing) are useful and necessary to gain a better understanding of the structures of populations of S. stercoralis.
Because S. stercoralis has not been endemic in Japan for decades [10], the Japanese samples collected from elderly hosts aged 58-104 years ( Table 1) may have been maintained only by auto-infection cycles for a long time. The higher heterozygosity of Japanese compared with Myanmar samples is thus possibly explained by an accumulation of heterozygous positions during the auto-infection cycle [10]. The exceptions Rk9-3, Rk9-11 and Rk8-7, which have reduced heterozygosity in sex or autosomal scaffolds or both are likely explained by recent cross events between two very closely related individuals, possibly during their isolation from a faecal culture. This, in turn, provides robust evidence that parthenogenesis of the parasitic female is mitotic (non-meiotic) and that free-living adults exchange chromosomes outside the host. Further, positive F IN (inbreeding efficiency) values of the Myanmar samples suggest that new infections occur in the prevalent regions by infective larvae produced through sexual reproduction between closely related individuals.
It has been suggested that new infections are unlikely to have occurred in Japan in the last 50 years [10]. Assuming that the genomic mutation rate of S. stercoralis is the same as that of C. elegans (9 Ã 10 −9 /site/generation) [34] and the minimum S. stercoralis generation time is 8 days, 50 years of asexually cycling within a human host can cause approximately 1,900 heterozygous sites to accumulate in the 86-M base diploid genome. Although this value is high, it is onlỹ 20% of the number of differences observed between samples isolated in Japan and Myanmar (S2 Table). These values suggest that the frequency of sexual reproduction, which can reduce heterozygosity, is also an important factor for determining the number of heterozygous sites in the nematode genome. The analysis of heterozygosity can therefore serve to help draw inferences about the history of infections and the prevalence of parasites in a specific area.
Supporting Information S1 Table. Mapping statistics of 300-base paired-end reads to the reference genome. Table shows