Genetic variation and population structure of Diaphorina citri using cytochrome oxidase I sequencing

Citrus greening disease, or huanglongbing (HLB), is currently one of the most devastating diseases of citrus. The bacteria thought to be responsible for the disease, Candidatus Liberibacter asiaticus impact the majority of commercial citrus species worldwide. These bacteria are transmitted by the Asian citrus psyllid (ACP), Diaphorina citri Kuwayama, which is now found in most citrus growing regions. With no known cure, ACP-vectored HLB is responsible for significant economic losses to the global citrus industry. A better understanding of the global genetic diversity of D. citri would improve current and future pest management and mitigation programs. To assess the genetic diversity of D. citri in worldwide collections, a total of 1,108 sequences belonging to ACP gathered from 27 countries in the Americas, the Caribbean, Southeast and Southwest Asia were examined for the study. 883 D. citri came from 98 locations in 18 different countries, and were sequenced using a 678bp fragment of the mitochondrial cytochrome oxidase I (COI) gene. Additionally, 225 previously-reported D. citri COI sequences, were also included in our analysis. Analyses revealed 28 haplotypes and a low genetic diversity. This is in accordance with previous reports on the little diversity of D. citri in worldwide populations. Our analyses reveal population structure with 21 haplotypes showing geographic association, increasing the resolution for the source estimation of ACP. This study reveals the distribution of haplotypes observed in different geographic regions and likely geographic sources for D. citri introductions.


Introduction
The Asian citrus psyllid (ACP), Diaphorina citri Kuwayama [Hemiptera: Liviidae], is a phloem-feeding insect that in addition to causing physical damages to the plant, is the most effective vector of Candidatus Liberibacter asiaticus, and Candidatus Liberibacter americanus, the causal agents responsible for Huanglongbing (HLB) disease [1]. Nymphs and adults of D. citri acquire the bacterial pathogen when feeding on infected plants. Once the bacteria are inside the vector, they replicate, invade the salivary glands, and thus enable Ca. Liberibacter to infect other citrus plants through psyllid feeding [2][3][4]

Insect collection
A total of 883 D. citri samples were collected from different citrus host-plants, and made available to us by collectors from 18 different countries between 2008 and 2012 (Table 1). Additionally, 225 D. citri COI sequences deposited in GenBank by Boykin et al. [27], Lashkari et al. [29], and Chaitanya et al. [30] were included (Table 2), for resulting in a total of 1,108 individuals included in this global phylogenetic analysis.

Genomic DNA isolation, Polymerase Chain Reaction (PCR) amplification, and sequencing
Total genomic DNA was isolated from each individual using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA). The primers (forward: DCITRI COI-L 5'-AGG AGG TGG AGA CCC AAT CT-3') and (reverse: DCITRI COI-R 5'-TCA ATT GGG GGA GAG TTT TG-3') designed by Boykin et al. [27] were used to amplify a partial fragment of COI from D. citri. The Polymerase Chain Reaction (PCR) contained 16.77μl of DNAase free water, 2.50μl of 10X Taq Buffer (TaKaRa Bio Inc, Mountain View, CA), 2.60μl of dNTP mixture 2.5μM each (TaKaRa), 0.13μl TaKaRa Ex Taq polymerase (5U/μl), 1μl of each primer at 10μM and 1μl of DNA template for a total volume of 25μl. The amplification was conducted in Applied Biosystems (Foster City, CA) Gene Amp1 PCR System 9700 thermal cycler and the following cycling parameters were used: 94˚C for 2 minutes, followed by 35 cycles of 30 seconds at 94˚C, 30 seconds at 53˚C, of 1 min at 72˚C extension and a final extension of 72˚C for 10 minute, as described in Boykin et al. [27].
The PCR products were stained with Blue/Orange 6X loading Dye (Promega, Madison, WI) and visualized with ethidium bromide on 1.5% electrophoresis agarose gels at 90V for 90 minutes in 1X TAE buffer. Documentation of these gels was performed using UVP Gel-Docit TS2 imager (Upland, CA). Before sequencing, the PCR products were purified with Exo-SAP-IT TM (Affimetrix, Santa Clara, CA), using the company protocol. Sequencing of PCR products was performed using bidirectional sequencing and 3' BigDye-labeled dideoxynucleotide triphosphates (v. 3.1 dye terminators, Applied Biosystems, Foster City, CA, USA), and run on an ABI 3730XL DNA Analyzer with the ABI Data Collection Program v. 2.0 at the Huck Institute's Nucleic Acid facility at Pennsylvania State University. Sequences were edited using Sequencher1 v. 5.0 (Gene Coders Corp. Ann Arbor, MI). The alignment was constructed in MEGA v. 6.0 [31], using Clustal W [32]. Sequences were trimmed of primers to 678bp.

Genetic diversity and population structure
Haplotypes (h) were identified using DnaSP v. 5.10.01 [33]. DNA Pairwise genetic distances were estimated in MEGA v.6. The number of haplotypes (h), haplotype diversity (Hd), nucleotide diversity (μ) the average number of pairwise differences (k) [34], and the Tajima's D [35] and Fu and Li's D and F [36] tests were determined with DnaSP v. 5.10.01 [33]. Tajima's and Fu and Li's tests were performed to test for neutral evolution and evaluate the potential for recent population expansion or contraction. General population genetics statistics were calculated to determine probable center of origins based on diversity. A haplotype network, which include haplotype frequencies, was constructed in PopArt [37] using the TCS statistical algorithm [38]. An analysis of molecular variance (AMOVA) was performed to determine the percentage of variation among geographic groups, among populations, and within groups, were performed in ARLEQUIN v. 3.5 [39].

Phylogenetic and haplotypes analyses
JModelTest 2 [40] was used to identify the best substitution model for the DNA sequence data. A Bayesian phylogenetic reconstruction was then conducted using Mr.Bayes v.3.2.3 [41] in Cipres [42] allowing transitions and transversions to have different rate. The analysis was performed using a Monte Carlo Markov Chain (MCMC) method. Four chains were run independently for 10,000,000 generations with sampling conducted every 1,000 generations. The COI sequence of the hackberry petiole gall psyllid (Pachypsylla venusta) was selected as an outgroup for the analyses (accession number: NC_006157). Additionally, a phylogenetic analysis using maximum likelihood methods was performed in PhyML v. 3.0 [43], using the nearestneighbor interchange tree rearrangement and 1,000 bootstrap replications. Output trees were visualized in Cipres with midpoint rooting.

Genetic variation
The final DNA sequence alignment of mitochondrial COI from 1,108 individuals was 678bp and contained a total of 29 polymorphic sites, 12 parsimoniously informative sites, and no indels. There were 18 synonymous changes and 11 non-synonymous. Unique sequences were submitted to GenBank (accession numbers: MH001373-MH001384). This resulted in 28 haplotypes shown by samples gathered from among the 26 locations analyzed for the study (Table 3). Of these haplotypes, 16/28 (57.14%) were identified from single individuals (singletons). The genetic diversity for the entire sequence dataset was low (<1%), and the maximum

Phylogenetic analyses
The HKY85 + I + G substitution model was selected as the best model based on Akaike Information Criterion ranking. Modeltest revealed a proportion of invariable sites of 0.1860, a transition/transversion ratio of 1.6314, and gamma shape parameter of 0.6100. The Bayesian ( Fig  1) and ML (not shown) phylogenetic analyses showed concordant topologies. The Bayesian Table 3. Geographic distribution and frequency of the 28 haplotypes of D. citri observed in this study.

Population structure
Geographic association was observed for genetic clusters and groups structure (Fig 1).  individuals from Singapore, 5/5 (100.0%) of the individuals collected in Mauritius, and 6/7 (85.71%) of the specimens from Reunion.
Using natural geographic barriers, previously posed hypotheses, and observed patterns of distribution, the data were delineated into five populations: North and Central America (NCA), South America (SAM), The Caribbean (CAR), South West Asia (SWA) and South East Asia (SEA). The AMOVA confirmed the significance of population structure for the five geographic populations ( Table 4). The AMOVA rejected the null hypothesis that the five populations are homogeneous. The highest source of variation (75.92%) was observed within regions, according to the five populations we mention above, for those combinations we tested. Variation was also observed among populations (10.05%), and within populations (14.03%).

Regional population summary
South West Asia population. In all, 413 individuals were successfully sequenced. A total of 16 haplotypes were observed in this region. We observed a higher nucleotide diversity (0.00447), and a lower haplotype diversity (0.218), as compared with the estimated overall nucleotide and haplotype diversity ( Table 5). The most common haplotype recovered in this region is haplotype Dcit-6. This haplotype was observed in 364 (88.13%) of SWA individuals. This haplotype was observed in 361/374 (97%) individuals from Pakistan, 3/14 (21.42%) of the individuals from India, and 12/12 (100%) of the psyllids gathered from Colombia. We observed that most of the diversity for the entire dataset belonged within Pakistan and India. Collectively, the haplotype diversity for these two countries was estimated at Hd = 0.149.
Eight of the sixteen haplotypes were exclusive to Pakistan psyllids (n = 381). Dcit-11 was observed in three individuals from Pakistan and Dcit-16 was observed in two. The remaining five haplotypes (Dcit-9, -10, -14, -16 and -17) seen in samples from Pakistan were recovered from a single individual each. Among the 14 samples from India, six haplotypes were seen in seven psyllids and were found to be exclusive to that country. Haplotype Dcit-23 was observed in two individuals. The remaining five haplotypes (Dcit-24-28) were recovered from a single individual each. All the individuals from Saudi Arabia (n = 8) and Iran (n = 10) showed haplotype Dcit-1. Demographic analyses for this population revealed significant values for  Table 5). For SWA populations the Fst value was significant and relatively high (0.86623 P<0.05), suggesting that there is population structure within these populations. South East Asia (SEA) population. In all, 148 sequences were analyzed, and a total of six haplotypes were recovered for psyllids from this region. The haplotype (Hd) and nucleotide (μ) diversity for this region was of 0.354 and 0.00088, respectively. Haplotype Dcit-2 was observed in 118/148 (79.72%) of the SEA individuals, and it was the only haplotype observed in individuals from Taiwan (n = 4), Vietnam (n = 10), Singapore (n = 3) and Mauritius (n = 5). In China, five haplotypes were identified, three of which were exclusive to this country (Dcit-05, Dcit-08, and Dcit-22) and made up a small percent of the sample (4%, 1%, and 1%, respectively). The remaining haplotypes, Dcit-1 and Dcit-2, were observed in eleven individuals (14.67%) and fifty-nine (78.66%) individuals, respectively. Haplotype Dcit-18 was only observed in 9/31 (29.03%) individuals from Thailand. Demographic analyses in this region using haplotype (Hd) and nucleotide (μ) diversity revealed non-significant values for Tajima's D (-0.94), Fu and Li's D (0.08), and F (-0.31). The Fst value for this region was significant (P<0.05), and relatively high (0.80376), suggesting the existence of population structure.
North and Central America (NCA) population. A total of 327 specimens were successfully sequenced for collections from this region, and seven haplotypes were observed. Six of the haplotypes were recovered from psyllids gathered in a single collection site each. The haplotype and nucleotide diversity of this population was of 0.734 and 0.00294, respectively. The majority of the samples from this region were observed in group B1 311/327 (95.1%). Haplotype Dcit-1 was the most common haplotype in this group; it was seen in 92.66% of the NCA individuals. This haplotype was seen in psyllids from the five geographical regions analyzed. In Edinburg, Texas, U. S. A., 13/119 (10.92%) individuals showed haplotype Dcit-3, consistent with samples gathered in the Caribbean. Haplotype Dcit-4 was only observed in 6 individuals and was unique to a specific location in Yucatan, Mexico. In Los Angeles, California, U. S. A., two individuals showed haplotype Dcit-19. Haplotype Dcit-7, Dcit-20 and Dcit-21 were observed in a single individual each, from psyllids gathered in St. Lucie, Florida, Edinburg, Texas and Los Mochis, Sinaloa, Mexico respectively. Demographic analyses using haplotype (Hd) and nucleotide (μ) diversity estimates resulted in non-significant results for Tajima's D (-1.51), Fu and Li's D (-1.37), and F (-1.71). The Fst value for this region was significant (P<0.03), but relatively low (0.10132), suggesting no population structure.
South America population (SAM). Sequences from 56 individuals were included in the analyses. Only three haplotypes were recovered from individuals gathered in this geographic region. A total of 71.43% of the individuals carried haplotype Dcit-2, the most common haplotype found in SEA. This haplotype occurs in 44/56 of the analyzed individuals. These individuals were found in Argentina, Brazil, Paraguay, and Uruguay. Haplotype Dcit-2 is included in group B2. The second most common haplotype, Dcit-6, was found in all individuals from Colombia (12/12) and is common to SWA collections. Haplotype Dcit-6 in included in cluster A. Four individuals, all from a single location in Brazil, carried haplotype Dcit-1. Haplotype Table 5. Genetic diversity of D. citri populations analyzed in this study. Values in bold are statistically significant with p = < 0.02. The values on the columns represents Tajima's D (D 1 ), Fu and Li's D (D 2 ), Fu and Li's F (F), sample size (n), average number of nucleotide differences (k), segregating sites (S), nucleotide diversity (π), number of haplotypes (h), and haplotype diversity (Hd). Genetic variation and population structure of Diaphorina citri using cytochrome oxidase I sequencing Dcit-1 is found in group B1. Demographic analyses in this population using haplotype (Hd) and nucleotide (μ) diversity revealed no significant values for Tajima's D (1.60), Fu and Li's D (0.88), and F (1.28). The Fst value for this region was significant (P<0.05) and relatively high (0.81727), suggesting the presence of population structure. Caribbean population. For the Caribbean region, sequences from a total of 164 individuals were analyzed, and three haplotypes were observed. Haplotype Dcit-3 was observed in 89.63% of the collected CAR samples. Additionally, this haplotype was recovered from 13 individuals from Texas. In Puerto Rico, fourteen individuals showed haplotype Dcit-1, which also occurs in NCA, SAM, SWA and SEA. Three individuals (27%) from Guadeloupe exhibited haplotype Dcit-13, which was unique to this region. Demographic analyses in this population using haplotype (Hd) and nucleotide (μ) diversity revealed no significant values for Tajima's D (-1.12), Fu and Li's D (0.89), and F (0.66). The Fst value for this region was significant (P<0.002), but relatively low, suggesting no population structure for this geographic region.

Genetic diversity
A total of 1,108 DNA sequences from the COI mitochondrial region were analyzed, revealing the existence of 28 haplotypes and a haplotype and nucleotide diversity of 0.723 and 0.00294, respectively. Our results are in accordance with previous reports of the relatively low genetic diversity of D. citri in worldwide populations [27]. Because of this low genetic diversity, these results further suggest that D. citri is a single species and not a species complex. However, the population structure indicates barriers to gene flow. The numerous private haplotypes observed suggest differentiation and limited gene flow between the five different populations.
It is possible that this genetic isolation is caused by geographic barriers or biotic factors, such as infection with the endosymbiont Wolbachia. Recent studies indicated that 66% of all insect species may be infected with this bacterium [44], and Wolbachia has been found infecting D. citri. Infection with Wolbachia often causes reproductive abnormalities including cytoplasmic incompability (CI) [45]. It has been suggested that infection with different Wolbachia strains may lead to reproductive isolation between populations [46]. An association of COI sequences for D. citri and Wolbachia strains has been observed: U. S. A., Mexico, Belize and American Samoa D. citri samples were infected with Wolbachia strain ST-FL, as well as samples from Pakistan and Colombia [47]. These results suggest that NAM and SWA populations might be infected with the same Wolbachia strain. Similarly, D. citri samples from China, Singapore and Argentina carried Wolbachia strain ST-173, a different strain than the one from SWA and NAM. Another strain of Wolbachia was associated with samples from the Caribbean, Puerto Rico, Trinidad and Barbados [47]. These results further suggest possible isolation between populations infected with different Wolbachia strains.
The information provided in this study is also important for the development of effective biological control programs to control D. citri populations. The parasitoid Tamarixia radiata has been extensively used as a biological control agent for D. citri [48,49]. T. radiata has been successfully introduced to several countries, including Reunion Island, Taiwan, Mauritius, Guadeloupe, Florida, the Philippines and Indonesia, with varied results in parasitism rates of D. citri [29,50]. In Reunion, biological control with Pakistani T. radiata effectively reduced psyllid populations and helped to mitigate the impact of citrus greening [51]. However, when T. radiata (originating from Taiwan and South Vietnam) was released in Florida, the reported parasitism was lower than 20% [52,53]. The use of genetic analysis that identify the different evolutionary lineages or strains of T. radiata and D. citri [54] may improve parasitism rates and effectiveness of these biocontrol programs. It is possible that T. radiata populations, as a biological control agent, are more effective for D. citri management when matched with pests that originate from the same geographic region. However, other ecological factors should be considered to understand variation in rates of parasitism [27].

Phylogenetic analyses and population structure
Based on the phylogenetic tree and haplotype network it was not possible to determine the ancestral haplotype for D. citri. Previous analyses in the geographic origin of D. citri had proposed Dcit-01 as the ancestral haplotype [27,29], based on frequency and geographic distribution, and suggested a southwestern Asia origin. However, this assumption was based on limited sampling in this geographical region. Previously, only two haplotypes were reported for SWA: Dcit-01 and Dcit-06. In this analysis, additional samples from SWA were included, and a higher diversity was observed, as 13 haplotypes were recovered from this region. Even though Dcit-06 was the most frequent haplotype in SWA it has a limited geographic distribution as it was only observed in Pakistan, and in a few individuals from India and Colombia. The rest of the countries included in SWA (Saudi Arabia n = 8; Iran n = 10) expressed haplotype Dcit-01 only. More sampling from this region, especially from India, where the two haplotypes co-occur, is needed to confirm the geographic origin of D. citri.
SWA collections exhibited relatively high nucleotide diversity but low haplotype diversity. The haplotype network shows a star-shaped network for this region with a common central haplotype (Dcit-06), and a relatively high number of low-frequency haplotypes. These observations suggest that this is an expanding population, possibly caused by a recent introduction [55]. The diversity estimates generated for each region suggested a possible SEA origin. Relative to SWA (Hd = 0.218), SEA (Hd = 0.354) has high haplotype diversity, low nucleotide diversity, fewer segregation sites, and a high average number of nucleotide differences.
Consistent with results from previous analyses, Dcit-01 was the most geographically distributed haplotype in this study, and was present in all five geographical regions analyzed. However, it was more frequent in NAM populations (92.66%), when compared with the other geographical regions (SWA 7.51%; SAM 7.14%; CAR 8.55%; SEA 10.20%), suggesting a single introduction for NAM. This haplotype was the most frequent haplotype for all the collections from the New World (US, Mexico, Belize, Costa Rica), and it was also present and frequent for some countries in the Old World. In Saudi Arabia and Iran, Dcit-01 was the only haplotype found. It is possible that Dcit-01 originated in SWA, where it is relatively frequent. However, more sampling from this region is necessary to confirm this.
Haplotype Dcit-02 was the most common haplotype for SEA and SAM populations, suggesting that this Asian region is the geographic origin for South America psyllid populations. However, haplotype Dcit-01 was also present in Brazil, where it also co-occurred with Dcit-02, suggesting two independent introductions for this country. It is possible that Dcit-01 was introduced from SWA, or from China or Indonesia, where this haplotype co-occurs with Dcit-02. In addition, haplotype Dcit-06 was found in twelve individuals from Colombia, suggesting a possible third introduction for South America.
Haplotype Dcit-03 was found mostly in the CAR region with thirteen individuals showing this haplotype from Texas collections. Dcit-03 clustered within group B3 along with other haplotypes recovered from Texas (Dcit-20), Guadeloupe (Dcit-13) and India (Dcit-23). Since a closer relative has been found in India (Dcit-23), SWA could be a possible origin for this haplotype. Because this haplotype was found in some of the Texas samples, it is possible that a second introduction from the Caribbean to the U. S. A. occurred.
Even though relatively low genetic diversity was observed, the geographical distribution of the different haplotypes in our study support the existence of (1) five different populations within the different geographic regions: South West Asia (SWA), South East Asia (SEA), North and Central America (NCA), South America (SAM), and the Caribbean (CAR), and (2) four major haplotype groups (Cluster A, subclade group B1group B2 and group B3). A substantial portion of the haplotypes found in this study was unique for each region suggesting limited gene flow among the five geographic regions.