Immigration and establishment of Trypanosoma cruzi in Arequipa, Peru

Changing environmental conditions, including those caused by human activities, reshape biological communities through both loss of native species and establishment of non-native species in the altered habitats. Dynamic interactions with the abiotic environment impact both immigration and initial establishment of non-native species into these altered habitats. The repeated emergence of disease systems in urban areas worldwide highlights the importance of understanding how dynamic migratory processes affect the current and future distribution and abundance of pathogens in urban environments. In this study, we examine the pattern of invasion of Trypanosoma cruzi—the causative agent of human Chagas disease—in the city of Arequipa, Peru. Phylogenetic analyses of 136 T. cruzi isolates from Arequipa and other South American locations suggest that only one T. cruzi lineage established a population in Arequipa as all T. cruzi isolated from vectors in Arequipa form a recent monophyletic group within the broader South American phylogeny. We discuss several hypotheses that may explain the limited number of established T. cruzi lineages despite multiple introductions of the parasite.


Introduction
Habitat alterations are transforming biological communities worldwide [1][2][3]. The current and future geographic distributions of many species in disturbed environments depends upon their interactions with novel biotic and abiotic features during immigration and while establishing a growing population [4]. Although many species fail to establish thriving populations in altered habitats, others are well-suited to migrate to, and prosper in, these novel environments. For example, several populations of plant [5][6][7], insect [8][9][10][11], mammal [12,13], and bird [14,15]  environments [1], while several microbial species benefit from the abundance of humans and human-associated hosts or vectors in similar habitats [16]. Although conservation efforts have focused primarily on the impacts of environmental changes on native plant and animal species, establishment or population growth of disease-causing microbial populations can have a strong negative impact on populations of native flora and fauna [17] in addition to their impact on human health and economy [18]. The rate or impacts of invasions of infectious microbes may be mitigated through public health programs based on an understanding of the dynamic processes determining immigration and establishment rates. The regularity at which disease systems are emerging in many urban and urbanizing areas underscores the importance of understanding how disease-causing microbes migrate to, and establish in, urban environments [16], one of the most dramatic examples of habitat alteration [19,20]. In this study, we examine the patterns that govern how Trypanosoma cruzi-the causative agent of Chagas disease in humans-invaded and colonized the city of Arequipa, Peru. Invasion of a new environment by a pathogen occurs in three stages: (1) immigration, or the movement of an individual to the new environment; (2) establishment of a population via reproduction and population growth; and (3) local dispersal [4]. The many studies focusing on the emergence of disease systems have generated a wealth of knowledge concerning factors affecting population growth [21,22] and considerable progress in understanding local dispersal [23]. For example, prior studies concluded that human-created containers increase the abundance of standing water that provide breeding habitats for the mosquitos that spread dengue virus [24]. Relatively few studies, by contrast, have investigated the early stages of invasion, the immigration and establishment processes, due to the practical difficulties of collecting the necessary data in the early stages of establishment of a novel species. The Chagas disease system in Arequipa, Peru, provides an ideal system in which to study the early invasion processes in urban environments. Arequipa has experienced rapid urbanization and human population growth in the previous half century [25]. The expansion in population sizes and geographic ranges of humans and their domestic animals provide T. cruzi and its only known insect vector in Arequipa, Triatoma infestans, with abundant and accessible hosts [26][27][28][29][30][31]. Estimates suggest that T. cruzi infection prevalence among humans in Arequipa exceeded 5% [27,[29][30]. The population history of the T. cruzi currently in Arequipa-including the geographic locations of the migrants that established the current population, the rate at which migrants enter and establish in the area, and the age of each established lineage-have not been investigated. Here, we performed phylogenetic analyses of maxicircle DNA, a nonrecombining circular element analogous to mitochondrial DNA, to estimate the number of independent T. cruzi lineages established in Arequipa and to estimate the timing of each establishment event. We assessed whether extant T. cruzi in Arequipa form a single monophyletic group, indicative of the establishment of a single migrant lineage, or multiple diverse clades, indicative of multiple independent immigration and establishment events.

Results
The maxicircle sequence is ideal for population genetic and phylogenetic analyses because it is conserved among diverse T. cruzi lineages and is non-recombining [32], unlike the nuclear genome which undergoes regular recombination [33]. Further, phylogenies based on multiple combinations of T. cruzi maxicircle and nuclear genes yield largely similar topologies [34]. While there was substantial maxicircle sequence diversity among samples across South American isolates, almost no diversity was observed among the samples from Arequipa. Over 13% (2055/15367bp) of the maxicircle sites were polymorphic among all 136 samples while only 16 sites were polymorphic (0.1%) among the 123 samples collected within Arequipa (Table 1).
Similarly, estimates of diversity among the samples from Arequipa derived from population genetic statistics were substantially lower than the total diversity across all samples (π = 6.8 � 10 −5 vs 8.18 � 10 −3 ; θ = 1.93 � 10 −4 vs 2.44 � 10 −2 ; average pairwise distance 1.04 vs 126). In contrast to the limited genetic diversity within Arequipa, other locales from which multiple isolates were sampled contain substantial genetic diversity among limited samples (N<3; Fig 1).
The monophyletic group containing all 123 samples derived from T. infestans and domestic animals collected in Arequipa coalesce in the very recent past, despite collection sites extending throughout Arequipa and surrounding towns across 7 years (Fig 1). The extremely low genetic diversity among these 123 samples provided insufficient information to resolve any phylogenetic relationships within this monophyletic group. The estimated divergence time suggests that the common ancestor of Arequipan T. cruzi lived 8.8-19kya (S1 Fig). However, care should be taken when interpreting divergence time estimates because of the low sequence diversity within the Arequipan clade and because of the uncertainty in divergence time at the root of the phylogeny. One sample derived from an infected human in Arequipa (Fig 1C) belongs to a lineage that is more closely related to samples collected in Rio Grande, Brazil than to the other samples collected in Arequipa (Fig 1). This sample is distinct from all other samples in Arequipa suggesting that T. cruzi can immigrate to Arequipa but may not establish in the vector population. The population size of an unsampled T. cruzi lineage-if present in the T. infestans population-must be at least 42 times smaller than the dominant population in Arequipa to have remained undetected by chance (p<0.05). A maximum likelihood phylogeny that included 18 additional partial maxicircle sequences obtained from NCBI (S1 Table) supports the result that all T. cruzi isolates collected in Arequipa form a monophyletic group derived from a recent common ancestor (S2 Fig).
In contrast to the monophyletic ancestry found in Arequipa, genetic diversity was apparent in the samples collected from other regions, despite limited numbers of samples (N<3). Population genetic diversity among the three samples collected La Esperanza, a town of 57 houses in the Cutervo Province of Cajamarca, Peru (π = 4.18 � 10 −2 ; θ = 4.18 � 10 −2 ), are much larger than those in Arequipa despite the limited number of samples [35]. The genetic relatedness and geographic distance among South American isolates are not correlated (Mantel test among DTU I: r = 0.79, p = 0.083, Mantel test among DTU VI: r = -0.5, p = 0.833; Linear regression, r = 0.22, p = 0.534; Table 2

Discussion
The invasion of recently altered environments by non-native species impacts the population health of native species as well as human health and economy. Investigation of the dynamic process of immigration and establishment of non-native species into these disturbed habitats has the potential to mitigate the impacts of pests and pathogens that are detrimental to human   populations as well as agricultural and native species [17]. The analyses presented here suggest that the population of T. cruzi in Arequipa, Peru, descended from a recent invasion consisting of only one evolutionary lineage. This conclusion is supported by the extremely limited genetic diversity observed among T. cruzi isolates sampled within and around the city, in contrast with considerable genetic diversity observed regionally and at other locales (Table 1; Fig 1). Several non-exclusive hypotheses that can be experimentally tested in the future may explain these results, including a low immigration rate, that immigration is common but immigrants rarely establish populations as a result of the low transmission rate between hosts and vectors, or that there is a high turnover rate among T. cruzi lineages. Successful invasion of a novel geographic area is a function of the rate of immigration, the temporal duration that a habitat has been suitable for establishment, and the probability that an immigrant can reproduce and establish a population. The influx of humans and associated products and domesticated animals into Arequipa over the last~60 years due to rapid urbanization and economic growth [25,36] has provided many opportunities for T. cruzi immigration. However, many migrants to Arequipa come from areas in which they were not exposed to T. cruzi, such as the neighboring regions of Puno and parts of Cusco which are beyond the range of T. infestans [37]. Nevertheless, one T. cruzi lineage that does not appear to be circulating within the T. infestans population was detected in an infected human (Fig 1), suggesting that T. cruzi immigration through human movement can occur. These data suggest that multiple T. cruzi lineages may have immigrated to Arequipa with all but one failing to transmit sufficiently to establish a population. Future studies will be necessary to identify the source, rate, and potential mechanism of T. cruzi immigration, independent of establishment probability, through analyses of the genomic diversity in human infections.
Prior studies across multiple species suggest that the majority of immigrants in most species that reach a novel geographic area fail to establish a population due to both inhospitable local environmental conditions [1,38] and stochasticity [39,40]. Environmental factors that can reduce establishment probabilities include unfavorable abiotic conditions, limited food resources or vectors, or an abundance of predators or competitors. The establishment probability of immigrant T. cruzi in an urban environment is likely depressed by a low contact rate between infected humans and vectors [41][42][43][44]. For example, many human immigrants moved to locations in the city without established T. infestans populations [25,37], which would result in few opportunities for T. cruzi transmission from infected immigrant humans to insect vectors, thus curtailing establishment probabilities. Further, the probability of an infected T. infestans transmitting T. cruzi to novel hosts is low [42,45,46], thus reducing the probability that a recently-immigrated T. cruzi lineage will establish a population. Both a low probability of establishing a population due to limited contact between T. infestans and infected immigrant humans or due to the limited probability of transmission from vectors to host are consistent with the observation of only a single established T. cruzi lineage in vectors in Arequipa.
The observed establishment probability is likely independent of competition among T. cruzi lineages. Competitive exclusion-where an existing population prevents the invasion of new immigrants-appears unlikely as the majority of city blocks do not contain T. cruzi [25,47] despite substantial vector populations [48]. Under the competitive exclusion hypothesis, one might expect different T. cruzi lineages establishing in different areas of the city. Indeed, multiple T. cruzi lineages do co-circulate within the same locality [44,49-53], as seen in the samples sequenced here from La Esperanza (Fig 1), and even within the same host [54,55], suggesting that competition is not preventing the establishment of multiple lineages in Arequipa.
The limited genetic diversity within Arequipa could result from a recent replacement of a previously dominant lineage through natural population processes. While the continuous substitution of a dominant strain through natural selection or drift is common in well-mixed populations, geographic structure within populations tends to result in the persistence of genetic diversity [56]. The absence of samples deriving from a previously established T. cruzi lineage in the fragmented urban and inter-district landscapes, much of which contains an active vector population but no T. cruzi [25,47,48], is suggestive that no previous lineages dominated this area. Passaging T. cruzi through animal hosts in order to isolate T. cruzi from infected bugs presents a potential for selection bias against specific genotypes. However, no other T. cruzi lineages were detected circulating in the vector population, despite the spatial and temporal range of samples in our dataset (7 years), suggesting that selection bias does not occur unless both guinea pigs and mice completely exclude the same genotypes.
In conclusion, all relevant data suggest that the T. cruzi circulating in vector populations prior to the recently-controlled epidemic in the city of Arequipa descended from a single immigrant lineage. While the ancestral lineage that gave rise to all extant T. cruzi in Arequipa did not necessarily reside in Arequipa, the data suggest that the common ancestor of all analyzed isolates immigrated recently and founded a population. The single divergent lineage found in a human patient suggests that T. cruzi may regularly immigrate to the city but that immigrants rarely establish populations.

Ethics statement
The Institutional Animal Care and Use Committee (IACUC) of Universidad Peruana Cayetano Heredia reviewed and approved the animal-handling protocol used for this study (identification number 59605). The Universidad Peruana Cayetano Heredia Ethics Committee approved human subject research and approved the collection of bugs from households (identification number 52186). Human subject research participants provided written consent for the collection of bugs from his or her household and for human subject research. The IACUC of Universidad Peruana Cayetano Heredia is registered in the National Institutes of Health at the United States of America with PHS Approved Animal Welfare Assurance Number A5146-01 and adheres to the Animal Welfare Act of 1990 [57].

Sample collection
DNA from 133 T. cruzi isolates were analyzed to determine phylogenetic relationships (Fig 2;  Fig 3; S1 Fig). The majority of samples were isolated from T. infestans bugs collected from houses throughout Arequipa (N = 114). T. cruzi isolated from bugs provide the most unbiased sample because there is no host selection for a specific genotype as T. cruzi from all hosts must pass through this vector in order to infect a new host. Additionally, there is only one vector species in Arequipa and thus there is no vector-specific selection bias. Samples were collected from houses throughout 11 districts in Arequipa from 2008 to 2015, encompassing the majority of the parasite's range within Arequipa. The subset of isolates used in this study (S2 Table) were chosen to maximize the chance of detecting diversity 1) by choosing isolates derived from bugs collected in different houses each year in all but six cases (15 isolates from 6 houses, 1 derived from all other houses) in order to minimize the chance of isolating sibling parasites and 2) by using samples that are spatially and temporally distributed to minimize sampling closely related isolates. Samples were collected as the epidemic was expanding and continued as it was controlled. This sampling scheme represents the optimal chance of detecting minor lineages and capturing diversity within the major lineage. Three of these samples were obtained using xenodiagnosis as described in Chiari & Galvão (1997) [58]. An additional ten samples from Arequipa were isolated from the blood of guinea pigs (N = 7), dogs (N = 2), and a human (N = 1). Six isolates were derived from Panstrongylus lignarius (N = 5)-also known as P. herreri [59]-and one guinea pig (N = 1) collected in the small towns of La Esperanza, Campo Florido, and Naranjal in northern Peru [35]. Cultures of three previously established strains isolated from humans in Bolivia (Bol-SH001 and Bol-DH29) and São Paulo, Brazil  Immigration and establishment of natural Trypanosoma cruzi populations (TC-y) were provided by the Infectious Diseases Research Laboratory at Universidad Peruana Cayetano Heredia (Fig 3).
T. cruzi was isolated from vertebrates (N = 8) using an adaptation of an artificial feeding system that was originally described in Harington (1960) [60]. Briefly, each blood sample was collected with citrate-phosphate-dextrose, transferred into a small plastic jar, and covered with a latex membrane fitted tightly with a rubber band. The jars were placed into an incubator and gradually heated to 35˚C. Once the temperature was reached, the jars were inverted to allow uninfected T. infestans to feed through the membrane for 15 minutes. T. cruzi from the eight laboratory-infected T. infestans, all 114 naturally-infected T. infestans, and five naturallyinfected P. lignarius were passaged through guinea pigs or mice in order to avoid isolating other microbes present in the vector, as described in Castillo-Neyra et al. (2016) [61]. Feces from infected vectors were injected into guinea pigs or mice and T. cruzi was isolated from the blood of each experimentally-infected mammal. T. cruzi were directly isolated in LIT culture media from the blood samples of three naturally-infected guinea pigs collected in Arequipa without passage through T. infestans.
Reference sequences of three T. cruzi isolates obtained from NCBI database were used in subsequent analyses: Silvio, isolated from a human in Para State, Brazil; Esmeraldo, isolated from a human in Bahia State, Brazil; and CL Brener, isolated from a human in Rio Grande, Brazil [62] (Fig 3).

Sequencing
DNA from all laboratory cultures was extracted using Qiagen DNEasy DNA Purification Kit. 150bp single-end read libraries were prepared using TruSeq Nano kit and sequenced to an average depth of >50X using Illumina's NextSeq500. Six T. cruzi isolates were prepared in duplicate, and one in triplicate, to allow estimation of sequencing error. Low quality bases were trimmed from raw reads using trimmomatic-0.32 [63].

Sequence assembly
Bowtie2 [64] was used to assemble maxicircle sequences to the most closely related reference sequence, Silvio (gi|225217165|gb|FJ203996.1), obtained from NCBI [62]. Duplicate reads were removed from the assembly using Picard's MarkDuplicates functionality [65]. The assembly had an average depth of >600X across all maxicircles. Maxicircle consensus sequences were determined using VarScan [66], ensuring highly-confident base calls by requiring a 60% match to call each SNP.

Maxicircle alignment
The T. cruzi maxicircle contains several conserved genes spanning more than 15kb. All assembled maxicircle sequences and the reference were aligned to the Silvio partial maxicircle sequence (gi|225217165|gb|FJ203996.1|), Esmeraldo strain complete maxicircle (gi|85718082| gb|DQ343646.1), and the CL Brener complete maxicircle (gi|85718081|gb|DQ343645.1) downloaded from the NCBI database. The sequences were aligned using MUSCLE as implemented in MEGA7 [67]. The ends were trimmed so that all sequences started and ended on the same nucleotide, resulting in a final alignment of 15357bp.

Phylogenetic analyses
The non-recombining T. cruzi maxicircle is ideal for coalescent analyses because such analyses are based on the assumption that there is no recombination. Phylogenetic analyses of the 15357bp maxicircle sequence from all samples and reference strains were performed using BEAST 1.8.4 [68]. Phylogenetic analyses assumed a model of sequence evolution in which the rates of A!T, C!G, and G!T are equal (123343) with γ-distributed rate heterogeneity. An Extended Bayesian Skyline tree prior [69] with constant evolutionary rates across lineages (strict clock) was chosen based on BEAST Model Test implemented in BEAST2 [70]. Starting with a UPGMA tree and running one Markov chain Monte-Carlo chain for each of five independent runs of 20 million iterations sampling every 2000 iterations ensured sufficient mixing after a 10% burn-in (ESS values >200 in Tracer v1.6.0) [71]. Tree files were combined using LogCombiner1.8.4, excluding a 10% burn-in for each. A Maximum Clade Credibility tree was generated from the combined tree file using TreeAnnotator 1.8.4 and FigTree v1.4.2 was used to visualize tree files (available at http://beast.bio.ed.ac.uk). Phylogenetic analyses were performed using the BEAGLE library to increase computation speed [72,73].
Eighteen partial T. cruzi maxicircle sequences (>4.5kb) available on NCBI (S1 Table) were aligned to the samples sequenced here and the reference sequences. Gaps were removed, yielding a final alignment of 2551bp. These eighteen sequences were chosen because they provided the longest overlapping regions of the maxicircle and provide sufficient information for phylogenetic analyses. A Maximum Likelihood estimation was performed using MEGA7 with a Tamura-Nei model and 1000 bootstrap replicates (S2 Fig).

Statistical Analyses
Metrics of population genetic variation, π and θ, were calculated using MEGA7 [67]. Assuming the sample from 123 infected T. infestans is representative of the T. cruzi population in vectors in Arequipa, the probability that a distinct lineage could be co-circulating but not detected by chance can be calculated using a binomial distribution. The minor lineage must constitute less than 2.41% of the total population in order for there to be a statistically significant chance that a distinct lineage was not detected in any of 123 sampled T. cruzi. Mantel tests and linear regression were used to test for a correlation between geographic and genetic distance. Mantel tests were performed using the vegan package in R [74]. Two Mantel tests were performed: one testing for correlation among samples belonging to DTU I and another among samples belonging to DTU VI. Statistics were based on Spearman's correlation using 100,000 permutations. A linear regression based on Spearman's correlation was performed using the rcorr function in R (S3 Fig). Linear regression was performed on all pairwise distances between samples belonging to the same DTU.
Supporting information S1 Fig. Most recent common ancestor to extant T. cruzi population in Arequipa lived as recently as 8.8kya. Estimates for divergence timing are displayed as 95% confidence intervals at each node. Estimates are based on rooting the phylogeny at 3mya. Sample collection locations are shown at the tips. The DTU each sample belongs to is labeled with its corresponding Roman Numeral in superscript. Due to the low diversity in maxicircle sequence among Arequipan isolates and the uncertainty in dating the tree root, care should be taken when interpreting divergence time estimates. (TIF) S2 Fig. The 123 Arequipan isolates form a monophyletic group even after adding all 18 DTU I partial maxicircle sequences greater than 4.5 kb obtained from NCBI. Short maxicircle sequences (<4.5kb) were not included in the analyses due to limited overlap with each other and the limited number of informative SNPs. The total alignment length was 2551 base pairs. Clades and branches that include samples sequenced here (Fig 1) are in bold font. Bootstrap support greater than 0.8 is labeled on nodes. (TIF)

S3 Fig. Euclidean distance is not correlated with genetic distance between maxicircles.
Only within-DTU distance data was used to perform the linear regression (red line). (TIF) S1