Population Structure in Naegleria fowleri as Revealed by Microsatellite Markers

Naegleria sp. is a free living amoeba belonging to the Heterolobosea class. Over 40 species of Naegleria were identified and recovered worldwide in different habitats such as swimming pools, freshwater lakes, soil or dust. Among them, N. fowleri, is a human pathogen responsible for primary amoeboic meningoencephalitis (PAM). Around 300 cases were reported in 40 years worldwide but PAM is a fatal disease of the central nervous system with only 5% survival of infected patients. Since both pathogenic and non pathogenic species were encountered in the environment, detection and dispersal mode are crucial points in the fight against this pathogenic agent. Previous studies on identification and genotyping of N. fowleri strains were focused on RAPD analysis and on ITS sequencing and identified 5 variants: euro-american, south pacific, widespread, cattenom and chooz. Microsatellites are powerful markers in population genetics with broad spectrum of applications (such as paternity test, fingerprinting, genetic mapping or genetic structure analysis). They are characterized by a high degree of length polymorphism. The aim of this study was to genotype N. fowleri strains using microsatellites markers in order to track this population and to better understand its evolution. Six microsatellite loci and 47 strains from different geographical origins were used for this analysis. The microsatellite markers revealed a level of discrimination higher than any other marker used until now, enabling the identification of seven genetic groups, included in the five main genetic groups based on the previous RAPD and ITS analyses. This analysis also allowed us to go further in identifying private alleles highlighting intra-group variability. A better identification of the N. fowleri isolates could be done with this type of analysis and could allow a better tracking of the clinical and environmental N. fowleri strains.


Introduction
The free living-amoeboflagellate Naegleria fowleri is the causative agent of primary amoebic meningoencephalitis (PAM) in humans, a rare but rapidly fatal disease of the central nervous  [15] Microsatellite isolation Partial genomic librairies. Total DNA was extracted by phenol-chloroform extraction according to the conventional procedure. To avoid the presence of extrachromosomal DNA, total DNA was loaded onto a 0.8% agarose gel (SeaPlaque Agarose, Lonza-Ozyme, Montigny-Le-Bretonneux, France) and run for 22h in TAE buffer at 20V and 16mA. The selected DNA corresponding to chromosomal DNA was digested with Sau3A (Boehringer Mannheim-Roche Diagnostics, Meylan, France). Restriction fragments between approximately 300 to 900 bp were transferred onto DEAE paper (Sanbrook 1989). The fragments were ligated within PUC18 vector (Amersham biosciences-GE healthcare, Glattbrugg, Switzerland) and amplified after transformation into competent XL blue cells (Amersham biosciences-GE healthcare).
Screening and sequencing. Clones were transferred on solid LB medium plates and transferred onto Hybond-N Nylon membrane (Amersham biosciences-GE healthcare). The screening was performed with an equal mix of oligonucleotides (TC) 10 , (TG) 10 , (CAC) 5 CA, CT (CCT) 5 , CT(ATCT) 6 , and (TGTA) 6 TG, labelled with the DIG oligonucleotide tailing kit (Boehringer Mannheim-Roche Diagnostics). Positive clones were directly analyzed by sequencing (Amersham, Pharmacia-Biotech T7 sequencing kit-GE healthcare). Five of these clones were retained for the analysis.

Data analyses
Genepop 4.2.2 software [28] was used to calculate allele frequency-based correlation (F IS , F ST and F IT ). F IS is the proportion of the total inbreeding within a population due to inbreeding within sub-populations. It ranges between -1 and 1, where a negative value corresponds to an excess of heterozygotes, and a positive value to heterozygote deficiency. F IS = 0 indicates Hardy-Weinberg allele proportions. F ST is the proportion of the total inbreeding in a population due to differentiation among sub-populations. Mean F ST estimates over loci in each population were calculated with the FSTAT software (version 2.9.3.2, [29]) using Nei estimators. F IT is the total inbreeding in a population due to both inbreeding within sub-populations, and differentiation among sub-populations. Expected and observed heterozygosity (He and Ho, respectively) were estimated with R software using Adegenet package [30]. Factorial correspondence analysis (FCA) implemented in GENETIX 4.05 software [31] was performed, which places the individuals in a three-dimensional space according to the degree of their allelic state similarities.
Assignment testing was performed by using the GENECLASS 2.0 software package [32]. The Bayesian method of Rannala and Mountain [33] was selected to perform self-assignment tests with algorithm of Paetkau et al. [34] on a simulation with 1000 individuals and alpha = 0.001.
Matrix of genetic distances was performed with the POPULATIONS 1.2.32 software and the Cavalli-Sforza and Edwards model. Phylogenetic networks were inferred from the distance matrix obtained from the microsatellite dataset by using the Neighbor-Net method in Split-sTree 4.13.1 [35].

Ethics Statement
The French sites (A-G) mentioned in Materials & Methods and Table 1 are industrial sites for which the authorizations have been obtained. These sites are secure and their precise location cannot be mentioned. The field studies did not involve endangered or protected species.

Results
Among the partial library, 3500 clones were screened and 30 were positives and were sequenced. A number of clones were excluded because, either no microsatellites were present, or because amplification was not possible, probably due to chimeric alleles. Five clones were selected for intra-specific analysis and referred to as NG25, NG42, NG69, NG115 and NG141. NG42 and NG141 loci contained different microsatellite repetitions and were divided in two parts for further analysis (NG42-1 and NG42-2 and NG141-3 and NG141-5, respectively). NG115 was found to be monomorphic with one allele at 115 bp for all strains and was not used thereafter. Thus, a total of 6 microsatellites markers were used for this analysis. The locus composition is indicated in Table 2 and shows that most of them are imperfect with repeats.
Microsatellite analyses displayed one or two alleles and can be considered as homozygous or heterozygous. The exception came from NG69 displaying three alleles for most of the strains, due to the presence of a duplication of this locus (Table 3). Therefore, one identical 214 bp allele was present in all samples. It was subsequently cloned and its sequence was identical to the other ones from the NG69 locus. The duplication was confirmed by the blast analysis of the N. fowleri genome recently sequenced [36], which indicates that each locus is represented on one contig, except for NG69, which is present on two different contigs.

Genetic diversity
The tested microsatellites revealed a different level of polymorphism in our samples. The NG141-3 was the most variable marker with 10 alleles identified, whereas the other microsatellites produced between 3 to 5 alleles.
The highest number of alleles was 16 and was found for the SP strains whereas strains from CAT and those from EA and WP presented 11, 13 and 12 different alleles, respectively (Table 3).
For the NG141, NG69 and NG42-2 markers, the overall observed heterozygosity per locus was higher than expected and reflected an excess of heterozygosity which was confirmed by the negative F IS values (Table 4). In contrast, the observed heterozygosity per locus was lower than expected for the NG42-1 marker, and displayed a deficiency of heterozygosity. Surprisingly, no heterozygous were found with the NG25 microsatellite, therefore explaining the values of 0 and 1 obtained with H o and F IS , respectively.

Population structure
The factorial correspondence analysis (FCA) showed that we recovered the 5 variants previously identified, EA, WP, SP, CHO and CAT with two additional ones: NZ and RA, respectively included in the SP and WP groups based on the previous RAPD and ITS analysis (Fig 1). Identified by microsatellite markers, these variants will be named as genetic groups. The GEN-ECLASS program was performed to assess the assignment of the 47 strains to these seven genetic groups. 97.9% of the strains were correctly assigned. Indeed, the SP strain NG060 and the CAT strain JP45E were not assigned. The EA strain SW1 was placed within the WP with a very weak probability (p = 0.016) (S1 Table).
The existence of this structure was also supported by the NeighborNet network (Fig 2). The genetic differentiation between the seven groups was very high with F ST values varying from 0.29 to 0.61 (Table 5).
FCA as well as the NeighborNet network included the three strains SW1, JP45E and NG060 into their respective groups. The FCA showed that the NZ group is very apart from the other ones whereas the SP and EA groups were found to be close. With the NeighborNet network, the SP, NZ and CAT groups are closely related to the EA group. CHO and RA groups are branching with WP (Fig 2). However, the ambiguous splits throughout the network indicate that the relationships within the clusters and groups are not resolved. Table 3. Allelic combination of the microsatellite markers for the strains examined.

Variation within genetic groups
The different groups displayed specific profiles with NG141-3 marker (Table 3). Every marker was able to identify some genetic groups, but not all. For example, NG42-2 specifically detected the genetic WP and SP groups but EA, CHO and CAT groups shared the same alleles. Additionally, differentiation within a given group could be observed. Therefore, the NG141-3 locus generated four different profiles within the EA group. The heterozygous profile (179 bp-180 bp) was only observed in France, the heterozygous profile (162 bp-197 bp) was only observed in the USA, and the two remaining ones (179 pb and 179 bp-197 bp) were both found in USA and France (Table 3). Note that the heterozygous profile 179 bp-197 bp was found only once in a French site among the twenty French strains examined, whereas the heterozygous profile 179 bp-180bp was recovered on the majority of them. A number of strains also had a specific microsatellite profile with one or several loci. Therefore, the B strains displayed a singular profile with the 252 bp allele at the NG42-2 locus. This private allele was not found elsewhere. The difference of 27 bp between this allele and the other one (225 bp) led us to sequence them. As expected, the result revealed an insertion of 27 nucleotides that is not due to microsatellite variations. This insertion, located at the beginning of the sequence could suggest the mismatch of the upper primer that could therefore explain the weaker intensity peak for the 252 bp allele compared to the 225 bp allele (S1 Fig). Other private alleles could be identified, particularly in the SP group ( Table 3). The two strains KUL and Moj 200 shared a same homozygous profile of 179 bp for the NG141-3 locus that was not observed elsewhere either. For the same DNA strains, similar profile results were observed, even though some parameters were changed, such as Taq DNA polymerases (Econotaq, Ozyme and Hot-Master Taq), PCR programs (temperature and number of cycles), or sequencers (MEGABACE 1000 or 96-capillary ABI 3730xl DNA Analyzer) (S2 Table).

Discussion
Microsatellite markers revealed a higher level of discrimination than any other markers used until now, and were used in this study to further explore the genetic diversity of N. fowleri and establish its population structure. To observe the same microsatellite profiles within a few years apart with different material and procedure i.e. different DNA extraction kits and sequencers, demonstrates the reproducibility of these markers. Another important point is the stability of the allelic profiles. Indeed, the DNA strains isolated a long time ago show the same profiles as recently isolated. Particulary, identical specific profiles with NG42 and NG141 were observed for the B or F strains.
In agreement with previous studies, numerous heterozygous were observed with most of the microsatellite markers suggesting that this species is diploid [37][38][39]. This was confirmed by the genome of N. fowleri recently published [36]. All loci are present in one copy in the N. fowleri genome except for NG69 that is present on two contigs. This confirms our duplication hypothesis that explains the third allele of 214 bp present in all samples.
The previous analyses based on the RAPD and ITS data identified five main genetic groups EA, WP, SP, CAT, and CHO. The two last groups encompass predominantly French strains and exhibited similarities with SP [6,15]. Specifically, the strains from the CHO group are exclusively French, originating in eastern France, and they belong to type 4 [15,19]. While the CAT group contains both French and Japanese strains. The microsatellite markers confirmed these five groups and within two groups (SP and WP) enabled the identification of two new groups, i.e., the NZ group which was considered in the previous analysis based on the RAPD and ITS data as a member of SP and RA which belonged to the WP group. Each of these groups emerged from the different genetic analyses such as factorial correspondence analysis and NeighborNet network in SplitsTree. Furthermore, the different groups were found to be well differentiated as given by the very high F ST value.
The examination of the NeighborNet network could provide some information about the evolutionary history of this species. The branching pattern of SP, NZ and CAT as well as EA was in agreement with the results obtained with RAPD and ITS data [6,15]. In contrast, the branching position of CHO with the WP was unexpected since CHO displays ITS and RAPD similarities with SP, NZ and CAT [6,15]. According to the microsatellite data, the CHO and RA groups shared the same specific profile with the NG25 locus (116 bp), and could explain this grouping. Another unexpected result as to do with the emergence of the additional group RA. On the basis of the RAPD and ITS data, this group belonged to the WP group. However, in this study, it can be considered as a distinct group according to the F ST value (Table 5 and Figs 1 and 2). In any cases, the RA group remains very closely related to the WP group but more extensive sampling is needed to confirm the branching pattern obtained in this study.
As previously found with RAPD and ITS analyses, ubiquist genotypes were also observed with the microsatellite markers, yet found to be more discriminating. This confirmed that a significant variability was not necessarily correlated with the geographical origin of the strains. For example, within the EA group, the two French strains, D1 and D2 from the same geographical site displayed two different allele profiles, and one of which was identical to the USA strains (WM and Lovell). Similar cases were observed within the WP group. Conversely, within the CAT group, the Japanese and French strains are distinguishable by using microsatellites in contrast to the RAPD and ITS markers. Additionally, the South Pacific strains exhibited allelic profiles that were not found elsewhere, and showed higher variability than the other groups.
The ubiquity of the genetic groups suggested clonal dispersion of the species. Another point which underlined this type of propagation is the excess of heterozygosity produced from most markers reflected by the negative F IS value. However, no heterozygosity was observed with NG25 although variability was present (four alleles described).
Most of the variants EA, WP and SP were observed both from clinical and environmental isolates (Table 1). So far, the other variants CAT and CHO are only of environmental origin. As already mentioned, the fact that some variants are either less prevalent in the environment or endemic could explain why they are not currently found in clinical cases [40]. So it is very likely that all variants of N. fowleri are pathogens. Additionally, no degree of virulence between them has been reported so far.
As a potential diagnostic tool, tracking N. fowleri is important to understand the dispersal mode and the colonization of this species in the various sites. In a previous article, we underlined the interest to fingerprint the pathogenic species [6]. Discriminating markers allow better exploration of the environment and to further establish the level of genetic diversity in the sites. In addition, since more and more PAM cases are reported worldwide [41], these markers could allow a better identification of the N. fowleri isolates in patients. Until now, the ITS markers were currently used for detecting the variability of the N. fowleri species. However the genotypes which were identified are not sufficiently informative. For example, the EA group, which was only represented by the ITS genotype 3 is widely distributed in the European and American continents. In contrast, at least four different profiles, were obtained within the EA group with our microsatellite markers. This enables to better understand and identify the clinical and environmental N. fowleri strains.

Conclusion
Microsatellite markers confirmed the genetic heterogeneity within the species and can lead to a better tracking of the N. fowleri isolates. These markers were able to differentiate the genetic groups and within these, some strains by highlighting the private alleles.
Additionally, microsatellite markers allowed to better understand the evolutionary history of this species as well as its dispersal mode by determining the prevalence of the genetic groups in the sites. Our results exhibited the four similar genetic groups previously identified with ITS and RAPD studies. However, the different clustering methods as well as all the tests used in this study produced two supplementary genetic groups, RA and NZ, that were previously included in the WP and SP groups, respectively. Interestingly, the RA group is very closely related to WP. Previous results based on ITS data showed that all members of WP and RA were identical. One might then suggest that WP and RA have diverged from a same lineage. However, additional data should be collected to confirm if it is indeed an isolated group. Similarly, other geographical strains such as the African and Asian ones are necessary to confirm the genetic groups, and additional microsatellite markers could be identified from the recent sequencing of the N. fowleri genome in order to reinforce this study.  Table. N. fowleri analysis. N. fowleri strains examined before this study. (XLSX)