Local and global genetic diversity of protozoan parasites: Spatial distribution of Cryptosporidium and Giardia genotypes

Cryptosporidiosis and giardiasis are recognized as significant enteric diseases due to their long-term health effects in humans and their economic impact in agriculture and medical care. Molecular analysis is essential to identify species and genotypes causing these infectious diseases and provides a potential tool for monitoring. This study uses information on species and genetic variants to gain insights into the geographical distribution and spatial patterns of Cryptosporidium and Giardia parasites. Here, we describe the population heterogeneity of genotypic groups within Cryptosporidium and Giardia present in New Zealand using gp60 and gdh markers to compare the observed variation with other countries around the globe. Four species of Cryptosporidium (C. hominis, C. parvum, C. cuniculus and C. erinacei) and one species of Giardia (G. intestinalis) were identified. These species have been reported worldwide and there are not unique Cryptosporidium gp60 subtype families and Giardia gdh assemblages in New Zealand, most likely due to high gene flow of historical and current human activity (travel and trade) and persistence of large host population sizes. The global analysis revealed that genetic variants of these pathogens are widely distributed. However, genetic variation is underestimated by data biases (e.g. neglected submission of sequences to genetic databases) and low sampling. New genotypes are likely to be discovered as sampling efforts increase according to accumulation prediction analyses, especially for C. parvum. Our study highlights the need for greater sampling and archiving of genotypes globally to allow comparative analyses that help understand the population dynamics of these protozoan parasites. Overall our study represents a comprehensive overview for exploring local and global protozoan genotype diversity and advances our understanding of the importance for surveillance and potential risk associated with these infectious diseases.

Introduction Infectious diseases are the major leading causes of death and disability worldwide [1]. Classical public health and sanitation measures have long served to minimize dissemination and human exposure to many pathogens that are spread by routes such as contaminated water or via vectors [2] but the recent Ebola and Zika outbreaks have highlighted the need for a global framework to reduce the risk and mitigate health crises [3][4][5]. Determining genetic diversity and the distribution of microbes can guide more efficient interventions that decrease the incidence of infectious diseases. Understanding patterns of geographical distribution and genetic variation is essential to monitor pathogen evolution and adaptation, design effective surveillance strategies, identify factors driving within and cross species transmission, and ultimately, reduce the burden of disease [6][7][8].
Gastroenteritis is a well-known public health threat and a significant burden of disease that requires improved monitoring [2,9]. A substantial number of gastroenteritis cases around the world are caused by the parasitic protozoans Cryptosporidium and Giardia [10][11][12][13][14]. These organisms are recognized as major causes of parasite-induced diarrhoea in humans and other animals that can be spread through various means, but transmitted mainly by interactions between humans and animals and by contaminated water or food [14]. Most human cryptosporidiosis cases worldwide are caused by Cryptosporidium parvum and C. hominis [15], although other species in domesticated and wild animals can infect humans and cause high morbidity and mortality in children, travellers and immunocompromised individuals [11,[16][17][18][19][20][21]. It has been suggested that Cryptosporidium can be found in 1% of stools of immunocompetent hosts from high-income countries and 5-10% of hosts from low-resource settings but recent molecular studies suggest an underestimation of the reported frequency of infection and the global burden of the disease [16,22]. Giardia is a common enteric parasite that infects a wide range of mammals, including domestic animals (livestock, dogs and cats), birds and amphibians. In humans, G. intestinalis (synonyms G. lamblia and G. duodenalis) is the most common intestinal protozoan parasite, frequently reported in association with water-and food-borne outbreaks [15,23,24], affecting about 280 million people worldwide [25] with some 500,000 new symptomatic cases reported each year in developing countries only [26].
Species within each of these two protozoan genera are morphologically indistinguishable and their taxonomic and genotype differentiation is mainly based on molecular characterization [14,[27][28][29]. Multilocus sequence typing has been developed for identification and genotyping of Cryptosporidium and Giardia but the most commonly employed markers in epidemiological investigations and population studies are the cell surface glycoprotein (gp60) and glutamate dehydrogenase (gdh), respectively [30,31]. The gp60 and gdh genes have been readily adopted as a key component for molecular epidemiological investigations because of their high reliability in the characterization of genotypes, standardization in genotype nomenclature and detection of variants within populations [15]. Although not all Cryptosporidium species can be detected with the gp60 typing tool [23, 32-34] a high variety of subtype families have been identified based on this gene, whilst several assemblages can be determined using gdh for Giardia.
Recent studies have revealed that some genotypes are genetically diverse, host restricted and comprise a zoonotic or anthroponotic reservoir. For instance, C. parvum subtype families IIc and IIe are considered anthroponotic and IIa is predominant in humans and other animals worldwide [35] whilst G. intestinalis genotypes A and B are the only assemblages found in humans [23]. Additionally, the prevalence of some genotypes of Cryptosporidium and Giardia are unevenly distributed in different regions of the world [15, 36] with major occurrences of novel and rare variants in developing countries compared to developed countries [15,30]. However, both diseases have been reported with a high incidence rate in New Zealand [15,37]. The peak of cryptosporidiosis cases each year coincides with the cattle calving season [38,39], while giardiasis incidence is more evenly distributed throughout the year (Fig 1). The annual number of notifications reported in New Zealand range between 500 and 700 and 1600 and 1800 for cryptosporidiosis and giardiasis, respectively [40]. Here, we reported the species and genotypes of Cryptosporidium and Giardia in New Zealand using gp60 and gdh markers during a long-term monitoring study (2009)(2010)(2011)(2012)(2013)(2014)(2015). New Zealand has a recent and limited domestic animal movement [41] and it is an excellent island system example to compare genetic diversity against continental situations. Genotypes for species isolated from New Zealand were contrasted with those recovered from other countries to understand their spatial distribution and similarity. Specifically, we determined the genetic variants of gp60 subtype families in Cryptosporidium and gdh assemblages in Giardia to address the following questions: What are the subtype families of Cryptosporidium and assemblages of Giardia in New Zealand detected with gp60 and gdh genetic markers? How is the genetic variation of these parasites distributed in New Zealand compared to the rest of the world?

Materials and methods Sampling
We performed an observational study of giardiasis and cryptosporidiosis in New Zealand. Both diseases are notifiable in New Zealand; therefore, suspected cases are confirmed through accredited laboratories. Fresh feacal samples were screened from symptomatic humans in New Zealand between 2009 and 2015 using either fluorescence microscopy or ELISA analysis in three national diagnostic laboratories, from both the North and South Islands of New Zealand. Positive stool samples were sent to the Hopkirk Research Institute, Massey University, New Zealand for further molecular typing analyses (see below). Human samples were submitted anonymously and no information about the patients was available. Our human sample covers 10% (range 4-19%) notified cryptosporidiosis and 13% (2-32%) notified giardiasis cases from 2009 to 2015 in New Zealand. A microscopy-based screen was employed for feacal samples of other sources that were obtained from farms, zoos, animal hospitals and urban wildlife, on an ad hoc basis, and as part of other studies [e.g., 42]. Stools were deposited in 2 ml screw cap tubes and stored at 4˚C in the laboratory until DNA extraction.

Molecular techniques
DNA extractions from stool samples were performed using the Isolate fecal DNA (Bioline) or QIAamp DNA Stool (Qiagen) kits following the manufacturer's instructions and carried out at the Hopkirk Research Institute. DNA extraction required disruption of the oocyst/cysts using a beadbeater (Tissue Lyser II, Qiagen) at 30 Hz for 5 min. DNA quantification was measured using the Qubit fluorometer. A fragment of the gp60 and gdh genes were PCR amplified using a combination of external and internal primers (S1 Table) for Cryptosporidium [43,44] and Giardia [31], respectively. Both strands of the amplification products were sequenced using Big Dye Terminator version 3.1 reagents and an ABI 3730XL automated DNA sequencer (Applied Biosystems, Foster City, California, USA).

Data analysis
DNA sequences analysis. Consensus sequences were assembled from forward and reverse reads and edited manually using Geneious v. 10.1.3 [45]. The sequences derived were used to identify species and genotypes of Cryptosporidium and Giardia by aligning to sequence entries in nucleotide databases using the program BLAST (http://www.ncbi.nlm.nih.gov/blast/) and checked by their corresponding genotype [e.g., 46].

Worldwide distribution of subtype families and assemblages
We extracted a dataset of available gp60 and gdh sequences from GenBank for the same species of Cryptosporidium and Giardia found in New Zealand. The search strategy used consist of one search string (species name) combined by the Boolean operator "AND" for gene name (gp60 or gdh). To check for duplicate sequences, accession numbers were compared between datasets from different countries. All results were verified and cleaned for further analyses (final searches were conducted on February 20, 2016). Sequences of gp60 and gdh for each specific country were allocated to the corresponding subtype families/assemblages following the source information from GenBank when available. In cases where this information was not available in GenBank we either aligned all sequences using SATé-II v.

Extrapolation and similarity of parasite genotypes
We used a proportional similarity index (PSI) in the R package spaa [50] to calculate niche overlap (co-occurrence) patterns among countries according to the type and number of genetic variants reported. Values close to 1 indicate very similar countries in their genotype diversity, whereas values close to 0 indicate countries with dissimilar genotypes. Next, we used the "specaccum" function in the R package vegan [51] to generate genotypes (cf. species) accumulation curves [52] by plotting the total number of different subtype families/assemblages against the number of all countries reporting data on GenBank combined and all countries in the world (~200). Rarefaction was used to produce smooth mean parasite genotypes variability and calculate 95% upper and lower bounds.
Pairwise F ST and an analysis of molecular variance (AMOVA) were performed in Arlequin v.3.5.2.2 [53] using the sequence data obtained from GenBank to recognize the genetic differentiation between pairs of populations and the level of population genetic structure, respectively. Gblocks v.0.91b [54] was used to remove nonconserved positions adjacent to gaps and nucleotide repeats in the protein coding-gene gp60.

Accession numbers
DNA sequences have been submitted to GenBank: accession numbers KY123918-KY124121. Note that only unique haplotypes were submitted but similar haplotypes found in different hosts were repeatedly submitted (see S2 Table).

Molecular identification of species and genotypes
A total of 579 sequences were obtained for Cryptosporidium and 1523 for Giardia (S3 Table). The sequences presented > 98% identity and E-value of 0.0 with BLAST searching against sequences in GenBank, confirming a reliable identification of species and genotypes. Cryptosporidium species detected in New Zealand using the gp60 marker were C. parvum, C. hominis, C. cuniculus and C. erinacei. Our molecular analysis identified 332 samples with the presence of C. parvum, 241 C. hominis, four C. cuniculus and two C. erinacei. Samples of C. parvum were collected from humans (86.4%), cattle (13%) and sheep (0.6%), whilst the remaining species were collected only from humans (Fig 2 and S3 Table). Subtype families found in C. parvum were IIa, IIc, IId and IIe (Fig 2). The most common subtype families were IIa (78.3%) and IId (20.5%). The subtype families in C. hominis most frequently identified were Ib (54.8%) and Ig (33.6%). Other subtype families found in our study were Ia, Id, Ie and If, which represented the remaining 11.6% (Fig 2 and S3 Table). Cryptosporidium cuniculus and C. erinacei were assigned to subfamilies Vb and XIIIa, respectively (Fig 2 and S3 Table). Assemblages from A to F of G. intestinalis are presented in New Zealand but assemblage B is notably dominant (79%) and found in humans, domestic, wild and zoo animals (Fig 3 and S3 Table).

Global geographic distribution of gp60 subtype families and gdh assemblages
Cryptosporidium parvum and C. hominis were found across all continents (Fig 4a and 4b and S4 Table) but Australia and the United Kingdom accounted for the majority of data for both pathogens (S4 Table). Ten subtype families of C. hominis (Ia-Ik) are found around the world whilst 16 have been identified for C. parvum (IIa-IIp). Cryptosporidium hominis subtype families Ia, Ib, Id and Ie are the most abundant worldwide (S4 Table). Regardless of the high diversity of subtype families in C. parvum, IIa and IId are the most abundant. A more restricted geographical distribution is reported for C. cuniculus and C. erinacei, with data from Australia, China, Czech Republic, Poland, the United Kingdom, Algeria and Germany (S4 Table). Both of the current recognized genotypes for C. cuniculus (Va and Vb) were reported in China and the United Kingdom but Vb has been additionally found in Australia, Czech Republic and Poland.
We extracted 4348 records corresponding to G. intestinalis assemblages from 64 countries spanning all continents except Antarctica (Fig 4c and S4 Table). However, the majority of available data were retrieved from China, Australia and Brazil. Assemblages A and B represent more than 60% of the reported data worldwide (S4 Table). There are two additional assemblages (G and H) that are not found in New Zealand. Assemblage G is reported in Australia, China and Spain whilst assemblage H is found in Colombia and USA.

Countries similarity and global genotypes accumulation curves
Proportional similarity analyses were carried out only between countries with more than 10 records retrieved. In general, C. hominis showed intermediate to high values (> 0.5) of similarity for most country comparisons with New Zealand visually similar to Australia, Portugal, Sweden and the United Kingdom (Figs 5 and S1). Cryptosporidium parvum showed high similarity (> 0.75) among most country comparisons with exception of comparisons against Uganda, Nigeria, China, Peru and India (Figs 6 and S2). In this case, New Zealand is most like countries from Europe, including the Netherlands, Sweden, the United Kingdom and Portugal and countries from other hemispheres such as Canada and Australia. High values of similarity were also observed for comparisons of G. intestinalis between countries spanning different continents but Denmark, Croatia, Romania, Canada and USA were visually dissimilar (< 0.5) (Figs 7 and S3). New Zealand is similar to many countries from across the world, from Europe to Asia, Africa and Australia (Fig 7). Fig 8 shows the cumulative number of genotypes for all countries combined. Accumulation curves for G. intestinalis and C. hominis were starting to show some downward curvature below 10 genotypes, which indicates a declining rate of parasite genetic type discovery, but only G. intestinalis approached an asymptote. The number of genetic types found in New Zealand for these two species is close to the plateau area of the curves (Fig 8). Interestingly, C. parvum show a different pattern with an upward curvature which indicates that many more genetic types than those currently recorded will be found if sampling in other countries is performed. However, this species may never attain an asymptote (Figs 8 and S4) indicating that other genotypes will be likely detected worldwide as well as in New Zealand. Rarefaction analysis using only countries with more than 10 records did not show differences of these trends (S5 Fig).
Population structure analyses were carried out between countries with more than 10 sequences available in GenBank. Sequences that do not overlap at least 50% of the total length of the dataset were discarded. Gaps and nucleotide repeats were not considered for pairwise F ST and AMOVA analyses. Large and significant F ST values are an indication of population subdivision and/or low gene flow between populations. There was not significant (p < 0.0001) population differentiation of comparisons among Nigeria, Mexico, Kenya and USA for C. hominis. New Zealand showed high significant pairwise F ST when compared with other countries with exception of comparisons among Nigeria, Mexico, USA and Australia (S5 Table). A lack of population differentiation for C. parvum was evident between most comparisons of estimated pairwise F ST (S5 Table). We did not find structured populations among closely geographical European countries (e.g. United Kingdom, Sweden, Spain, Slovenia and Poland) as well as other countries spanning around the world, with exception of most comparisons among Egypt, India and China (S5 Table). New Zealand showed no population differentiation with Australia, Argentina, USA, Sweden, Poland, Slovenia and Turkey. Giardia intestinalis also revealed no population subdivision among most countries including those from different continents, with exception of most comparisons among Ethiopia, Egypt, Denmark, Croatia, Romania and Colombia. New Zealand presented no population subdivision with countries from different latitudes including Norway, Malaysia, Uganda, Sweden, Indonesia, Japan, Germany, Australia, among others (S5 Table). However, there was substantial polymorphism within populations (> 80%) found in the AMOVA which result in low but significant proportions of the total variation partitioned among countries (14%, 19% and 16%, p < 0.0001 for C. hominis, C. parvum and G. intestinalis, respectively).

Discussion
Our relatively large sample size and genotyping diversity of Cryptosporidium and Giardia from New Zealand allowed us to perform comparative analyses with available worldwide data. All species and genetic variants reported in New Zealand of either Cryptosporidium or Giardia are found in several countries and this homogenization suggests that their populations are well established and mixed throughout the globe [55,56]. Some cases of structure relating to geography have been indicated by previous studies [e.g., 57, 58] but they also show sharing alleles among different populations around the world. We also found evidence of geographically- Genotypes diversity and distribution of Cryptosporidium and Giardia associated differentiation with high F ST values that were generally confirmed by low PSI values, despite large confidence intervals (S1, S2 and S3 Figs). The evidence of structure is likely a consequence of limitations in the estimators due to an uneven number of samples, high withinpopulation variation and reporting bias [59]. The use of a single and high variable locus and non-contemporaneous samples also affect the probability of a reliable estimation of the genetic differentiation from sequence data [60,61]. Furthermore, the population structure of these species, going from clonal to panmictic, is influenced by diverse factors such as transmission dynamics and host-specificity [57,62]. The global distribution of species and genotypes within Cryptosporidium and Giardia might represent the geographic range shifts of domesticated animals, historical movement of livestock and transmission facilitated by human travel and non-native fauna [35, 63,64]. This is likely the case of the rare C. cuniculus and C. erinacei causing human disease in New Zealand. Cryptosporidium cuniculus has recently expanded its known host range from rabbits to humans and kangaroos [65] and it is a human pathogen of public health importance related to waterborne outbreaks in the United Kingdom [66]. The presence of this species and a similar genotype in New Zealand and few other countries may suggest a dispersal event caused by human factors. Most recently, C. erinacei has been identified as a cause of gastroenteritis in . More systematic sampling of local and introduced fauna along with whole genome sequences over short timescales will allow us to identify major functional loci involved in host specificity, proliferation and host shifting. This information will provide insights into the zoonotic potential of different genotypes, host range, crossspecies transmission and risks for host shifts to humans [67].
We acknowledge that Cryptosporidium species diversity will be underestimated because most species within the genus fail to amplify with standard primers used for the gp60 marker [68,69]. For instance, C. andersoni and C. bovis have been previously identified in cows in New Zealand using the 18S rRNA gene markers [70]. Moreover, our approaches are dependent on sequences submission, accurate taxonomy and complementary information in Gen-Bank. Some genotypes have been recognized as species or wrongly submitted to a different species. Cryptosporidium erinacei was previously described as C. sp. hedgehog [20] with the single XIIIa subtype family but a subtype family VIIa was also proposed (S4 Table) [71]. Likewise, the subtype family IIf in C. parvum has recently been classified as C. tyzzeri [72]. This is also the case for G. intestinalis. Several assemblages of G. intestinalis are genetically isolated lineages and have been nominated as different species [73]. Some genotypes in C. parvum and G. intestinalis might be named as new species when more data become available. Sequences from public databases are an extremely valuable source of information [74]. Genotypes and abundance in different countries may not be totally represented because some researchers avoid submitting multiple sequences if the same genotype is already found within the sequence data collection of GenBank [74]. The potential bias caused by this practice does not help to capture a substantial part of the genetic diversity globally. This information is fundamental for analyses of the evolution and movement of the parasites, the possible origin and patterns of dispersal of these organisms or their genotypes, prioritization on potential hazard reduction to susceptible hosts and control to prevent spread between regions. For example, C. parvum haplotypes IIg, IIh, IIi are reported in Uganda, IIm in Bangladesh, and IIn in India; C. hominis Ii is found in Asia; and G. intestinalis assemblage G in Australia. Given current reporting practices, it is unclear if these genotypes are unique to these regions deserving greater attention or if they are widely distributed but simply unreported on GenBank. Using our literature review framework (S1 References), we found that subtype families IIg, IIi and IIm were also reported in India, Ii in USA and assemblage G in China and Spain. We predict that unreported sequences from different countries will decrease the geographical population structure, however, it will be dependent on whether researchers do not report sequences because of 100% identity or because they are the same genotype. Another alternative source of information to measure the population genetic variation of these parasites is ZoopNet [75,76] but this database is not publicly available. A previous study using ZoopNet found high genetic diversity in C. hominis and C. parvum from Australia and the United Kingdom, respectively [75]. Estimations of the genetic diversity in this study align with our results and the similarity between these nations and New Zealand.
It has been demonstrated that mixed infections (different alleles per isolate) within a host can produce recombinant genotypes that comprise alleles inherited from each parental line [77,78]. These recombinant events are associated to the high probability of finding new genotypes (S4 Fig). Cryptosporidium parvum displayed a large genotypic variation as a result of gene combinations and host range shift [67]. Similarly, genetic recombination has been shown to occur commonly in C. hominis and G. intestinalis in developing countries [59,78]. This is an important issue for public health risks because genetic recombination is a driving force for the emergence of virulent subtypes [79]. The distribution of these species in humans might vary among geographical areas and socioeconomic conditions [68] and interestingly, high genetic diversity and new genotypes have been particularly found in children or unexpected hosts in rural settings from developing countries [e.g., 80,[81][82][83][84][85]. Unfortunately, the data reported from developing countries is low with limited information from Africa, Latin America and populous countries such as China and India [30].
Comprehensive surveillance strategies will require efforts at multiple scales to help decrease the significant burden caused by Cryptosporidium and Giardia. Understanding of the transmission pathways of these parasitic protozoans is imperative. The large number of zoonotic cases highlights the importance of rates of contact and host densities. For example, local spread of Cryptosporidium and Giardia in New Zealand has been facilitated by cattle densities in the catchment of drinking water particularly within the most populous and agricultural regions [37,39,42,86]. It is uncertain in our study if this is the transmission pathway, particularly for Giardia which shows less seasonality in case notification rates (Fig 1), but a substantial number of diarrheal cases around the world suggest that infections among humans has an increasing trend caused by waterborne transmission [e.g., 24, 87]. Animal exposure, seasonal patterns and contaminated food are other key factors for transmission in certain geographical settings and more detailed analysis will be need to identify the pathways of human cryptosporidiosis and giardiasis.