Population Genetic Structure and Potential Incursion Pathways of the Bluetongue Virus Vector Culicoides brevitarsis (Diptera: Ceratopogonidae) in Australia

Culicoides brevitarsis is a vector of the bluetongue virus (BTV), which infects sheep and cattle. It is an invasive species in Australia with an assumed Asian/South East Asian origin. Using one mitochondrial marker (i.e., part of the cytochrome oxidase subunit I gene) and six nuclear markers, we inferred population genetic structure and possible incursion pathways for Australian C. brevitarsis. Nine mitochondrial haplotypes, with low nucleotide sequence diversity (0.0–0.7%) among these, were identified in a sample of 70 individuals from seven sites. Both sets of markers revealed a homogeneous population structure, albeit with evidence of isolation by distance and two genetically distinct clusters distributed along a north-to-south cline. No evidence of a cryptic species complex was found. The geographical distribution of the mitochondrial haplotypes is consistent with at least two incursion pathways into Australia since the arrival of suitable livestock hosts. By contrast, 15 mitochondrial haplotypes, with up to four times greater nucleotide sequence diversity (0.0–2.9%) among these, were identified in a sample of 16 individuals of the endemic C. marksi (sampled from a site in South Australia and another in New South Wales). A phylogenetic tree inferred using the mitochondrial marker revealed that the Australian and Japanese samples of C. brevitarsis are as evolutionarily different from one another as some of the other Australian species (e.g., C. marksi, C. henryi, C. pallidothorax) are. The phylogenetic tree placed four of the species endemic to Australia (C. pallidothorax, C. bundyensis, C. marksi, C. henryi) in a clade, with a fifth such species (C. bunrooensis) sharing a common ancestor with that clade and a clade comprising two Japanese species (C. verbosus, C. kibunensis).


Introduction
Accurate delineation of a vector species' status and its population and evolutionary genetics are fundamental to biosecurity and mitigation of the emergence of exotic diseases.Cryptic species (i.e., morphologically indistinguishable but genetically different species) are widespread among eukaryotes and prokaryotes, including all major orders of insects (e.g., Lepidoptera [1], Diptera [2], and Hemiptera [3]).The detection of cryptic species in the dipteran Culicoides variipennis species complex and the subgenus Culicoides [4] had major implications for our understanding of vector competence for Bluetongue virus (BTV) [5].
Culicoides wadai, C. actoni and C. fulvus, while competent vectors of BTV, have only been found in northern Australia [47].BTV is asymptomatic in cattle across the ranges of C. brevitarsis, C. wadai, C. fulvus, and C. actoni in Australia.By contrast, sheep are highly susceptible to BTV but are farmed predominantly in the southern parts of Australia, where they overlap geographically with C. brevitarsis in its southern-most range of New South Wales (NSW).Therefore, a transfer of BTV into sheep could occur in these parts of Australia, posing a risk to this region's livestock industry.
Given that C. brevitarsis is a competent vector of BTV and regarded as the most likely vector of BTV between Australian cattle and sheep, it is important that a better understanding of this species be developed.In particular, information on its population genetic structure is needed, but we also need to determine whether there is evidence of it being a cryptic species complex, whose members could have different levels of vector competency for BTV.Finally, because C. brevitarsis may have been recently introduced to Australia, comparing Southeast Asian and Australian samples of Culicoides might reveal the evolutionary relationship among these samples, and point to the likely incursion pathways.Such information could become critical for a successful Australian response to a BTV epizootic event like that in Europe (reviewed in [48][49][50]).The C. variipennis species complex illustrates the importance of such information: different subspecies in this complex have different levels of susceptibility to BTV infection and different virus competency rates [5].Amplicons from individual samples were purified and eluted in 25μL ddH 2 O using Qiagen's Qiaquick PCR purification kit (Cat.no.28104).For sequencing of individual samples, 2μL of purified amplicon for each sample was used as template in individual DNA sequencing reactions, which also contained 1.6mM primer, 1.5μL of 5x BigDye sequencing buffer and 1μL of BigDye V3 sequencing reaction mix, made up to a final volume of 10μL.The sequencing profile followed that of Behere et al. [16] prior to electrophoresis at John Curtin Medical School of Research, Australian National University, Canberra, Australia.

Design of exon-primed intron-crossing markers and microsatellite DNA markers
We also designed ribosomal protein (Rp) gene exon-primed intron-crossing (EPIC) markers for C. brevitarsis based on the methods reported in [52] and [19] Microsatellite DNA markers were designed by searching reported microsatellite loci sequences available in the GenBank.We followed the methods of Tay et al. [53] to screen against the potential of incorporating flanking primer-annealing sequences that might be due to Transposable Elements (TEs).Briefly, four microsatellite DNA loci (i.e., Cimi-12 [GenBank: DQ008970], Cimi-66 [GenBank: DQ008971], Cimi-69 [GenBank: DQ008972] and Cimi-85 [GenBank: DQ008973]) were searched for possible target-site duplication sequences, as well as reverse-translated using BLASTX [54] to identify the likely presence of a reverse transcriptase gene, and to search for the presence of target-site duplication sequences that would indicate Class I retrotransposable TE sequence insertions.We note that using BLASTX to identify the presence of reverse transcriptase is only effective against Class I (RNA-associated/'copy and paste') TEs but ineffective against Class II (DNA-base/'cut and paste') TEs (see [55]).
Conditions used during PCR amplification and expected amplicon sizes for the EPIC and microsatellite DNA markers are provided in Table 2.We used the Fragment Profiler software within the MegaBACE Genetic Profiler suite v1.2 (GE Healthcare Life Sciences, UK) for allele scoring of EPIC and microsatellite DNA loci of biting midges samples in this study.EPIC and microsatellite DNA markers were labelled either with TET or FAM [19], and electrophoresis of PCR amplicons was carried out at the Genetic Analysis Facility, James Cook University (Townsville, Queensland, Australia), as described in Behere et al. [19].Inter-and intra-species nucleotide diversity and genetic distances in Culicoides Primary sequencing reads were assembled into contigs using Pregap4 and Gap4 from the Staden package [56] and the resulting contigs were then aligned using Se-Al version 2.0a11 (http://tree.bio.ed.ac.uk/software/seal/).To identify unique haplotypes, the contigs were clustered using a K-align rapid multiple sequence alignment (MSA) algorithm for DNA sequences (http://www.ebi.ac.uk/Tools/msa/kalign).Nucleotide sequence diversity among the C. brevitarsis haplotypes (i.e., Cbrev-01, . .., Cbrev-09; 835 base pair (bp)), and among the C. marksi haplotypes Cmarksi-01, Cmarksi-03, . .., Cmarksi-16 (547bp) were calculated as: (i) p distances (i.e., the proportion of sites with different nucleotides within pairs of sequences); or (ii) evolutionary distances using Tamura and Nei's [57] nucleotide substitution model.In practice, we used MEGA version 5.05 [58] to obtain these estimates.Rate-heterogeneity across sites was modelled using a discrete Γ distribution with four rate categories and a shape parameter (α = 0.1759; model inferred using MODELTEST version 3.7 [59]).Standard errors of genetic distance estimates were obtained using nonparametric bootstrapping with 500 replicates.Mean genetic distances between Culicoides species were also obtained from an alignment of 455bp under the conditions described above for the within-species comparison.Unique haplotypes have been deposited in GenBank [GenBank accessions: KP201844-KP201872].
MtDNA COI sequences from Culicoides species previously deposited into GenBank, covering the same 547bp region as the C. marksi haplotypes reported in this study, were also included (Table 1) in our study in order to estimate nucleotide sequence divergence between and within species.GenBank entries were found using BLASTN [54] search with Culicoides as the Entrez [60] query against the Cmarksi-01 haplotype query sequence and specifying 'Other (nr etc.)' as the database.
The mtDNA COI haplotype of the individual from Grafton [GenBank: KJ162974] is likely to be Cbrev-01 or Cbrev-05 (due to the presence of a C in position 646, which it shares with Cbrev-01 and Cbrev-05, but not the other Australian sequences; S4 Table).To differentiate between Cbrev-01 and Cbrev-05 would require a C or a T at position 1,144, but this position was unfortunately not sequenced.Both Cbrev-01 and Cbrev-05 have been found at Grafton, so KJ162974 is most likely Cbrev-01 or Cbrev-05.The sequence from Grafton [KJ162974] was also identical over a shared 646-655bp region to those from the Solomon Islands [GenBank: KJ162971, KJ162970, KJ162969, KJ162967] and Timor-Leste [GenBank: KJ162968].

MtDNA COI haplotype network
Haplotype networks were drawn using TCS version 1.21 [62] for the 835bp mtDNA COI haplotypes of C. brevitarsis, and for the 547bp mtDNA COI haplotypes of C. marksi.The resulting networks were subsequently refined by hand.
Analysis of Australian C. brevitarsis population structure based on partial mtDNA COI gene sequence and EPIC-microsatellite DNA markers (I) Inference of population structure based on mtDNA COI partial gene.Using 835bp of the partial mtDNA COI gene, we estimated genetic differentiation among and between C. brevitarsis populations across northern and eastern Australia using Weir and Cockerham's [63] single locus F-statistics, as implemented in GENEPOP version 4.2 [64,65].Estimates of diversity indices (haplotype diversity (h) and nucleotide sequence diversity (π)) in C. brevitarsis (based on 835bp) and C. marksi (based on 547bp), and the Analysis of Molecular Variance (AMOVA) [66] for C. brevitarsis was performed using Arlequin version 3.11 [67].To identify whether the Australian C. brevitarsis population could be subdivided into smaller groups, we performed an AMOVA whilst considering the following entities: (i) a single population (i.e., where all the Australian specimens of C. brevitarsis belong to a single population); (ii) a northern population (comprising specimens from Kalumburu, Douglas-Daly, and Katherine) and an eastern population (comprising specimens from Lismore, Grafton, Bellingen, and Paterson), and (iii) a large number of site-specific populations (i.e., where each collection site has its own population).The reason for assuming the northern and eastern populations was based on the general close proximity of collection sites to each other and did not reflect the midges' flight capability, which is usually dependent on environmental factors (e.g., weather conditions/wind patterns [29,68]).We also calculated the inter-population F ST between the seven C. brevitarsis populations using GENEPOP (Option 6) to determine whether geographic distance between locations may have contributed to an underlying population structure; to investigate this, we plotted geographic distances (km, estimated from GPS data) against the F ST /(1-F ST ) values.
(II) Inference of C. brevitarsis population structure based on nuclear markers.Using Arlequin, we also obtained results from an AMOVA based on the EPIC (RpS4, RpS2A, RpS2B, RpL27A, RpS6, and RpS8) and microsatellite (Cimi-69, and Cimi-85) markers.We excluded RpS4 and RpS2A and the two microsatellite loci from our analyses because: RpS4 and RpS2A were monomorphic; Cimi-66 was found to potentially contain a 5-bp (TTGCA) target-site duplication sequences surrounding the (CA) n SSR units; and Cimi-12 amplified inconsistently with >3 alleles detected in various C. brevitarsis individuals (data not shown).Estimates of nucleotide sequence diversity (π), haplotype (gene) diversity (h), F IS and their associated P values, pair-wise F ST values between populations, average number of alleles (a), allele richness (r), and observed and expected heterozygosity estimates (H O , H E ) across all nuclear markers for each population were also obtained using Arlequin, and departures from Hardy-Weinberg equilibrium were calculated using GENEPOP.Pair-wise F ST estimates between populations with significant P-values were adjusted using the Holm-Bonferroni correction [69] at the family-wise error rate of α = 0.05.We determined the presence of isolation by distance for our C. brevitarsis populations using the transformed (i.e., F ST /(1-F ST )) pair-wise population F ST estimates (i.e., Slatkin's [70] linearized F ST ), plotted against natural logarithm of geographic distances between locations of individual populations, with the significance of observed associations determined using Spearman Rank Correlation coefficient with 5,000 permutations, as implemented in GENEPOP.

Structure analysis
We used the Bayesian MCMC simulation program STRUCTURE [71] to determine the most likely number of ancestral gene clusters based on the current population nuclear marker (four EPIC and two microsatellite) genetic data for C. brevitarsis.The rationale and parameters used to calculate the ad hoc statistic ΔK values (this provides an indication of the second-order rateof-change to the estimated log probability of data Pr(X|K) between increasingly large values of K (i.e., number of genetic clusters)) were as described in Behere et al. [19].Once the best ΔK values were found, a final simulation (comprising 500,000 MCMC samples and a burn-in of 50,000) was conducted to obtain the most likely ancestral population structure.

Phylogenetic analysis
A phylogenetic tree was inferred using a multiple sequence alignment (MSA) of 182 codons from the mtDNA-encoded COI genes of the 136 Culicoides sequences (Table 1).Orthologous sequences from Anopheles quadriannulatus [GenBank: DQ792581], A. gambiae [GenBank: Nc_002084], A. dirus D [GenBank: AJ877572], and A. janconnae [GenBank: HQ335348] were aligned to these 136 sequences, allowing us to root of the inferred tree of Culicoides.The MSA (see S1 and S2 Files) was inferred using SeaView version 4.4.2[72].
Initially, the MSA of codons was translated into amino acids and then partitioned into MSAs of codon sites.Whenever sequences in these four MSAs were found to be identical to one another, all but one of these sequences were removed to reduce bias in the phylogenetic estimates.
Most molecular phylogenetic methods assume that the sequences have evolved under globally stationary, reversible, and homogeneous (SRH) conditions (defined in [73][74][75][76][77]).To determine whether these conditions were met by the data, we assessed each alignment using a matched-pairs test of symmetry [73,78].In practice, we used Homo <www.csiro.au/homo>,a program that returns a χ 2 -distributed test statistic for all pairs of sequences.The corresponding P values were then charted in a PP-plot against the expected P values.A J-shaped distribution of dots in this plot, with a larger-than-expected proportion of low P values, indicates that the assumption of evolution under globally SRH conditions is violated by the data.
Given the results of our assessment of model misspecification (described above) and other considerations (described below), we conducted our phylogenetic analysis on the MSA of first and second codon sites.In practice, we used IQTREE version 0.9.3 [79] to find the most likely (ML) tree.In so doing, we first identified the best model of evolution for the data using the-m TESTONLY option.Next, we inferred the most likely tree and completed the phylogenetic analysis with a nonparametric bootstrap analysis (using the-bb 1000 option).The result was displayed using FigTree version 1.40 <http://tree.bio.ed.ac.uk/software/figtree>.

Ethics
The authors declare that the research does not involve human subjects, human material or human data.The study also does not involve vertebrates or any regulated invertebrates, and therefore does not require any ethics committees' approval.
A total of 15 haplotypes was found in our alignment of 16 partial sequences (547bp) of the mtDNA marker from C. marksi (all from Murray Bridge in SA and Moree in NSW.Two of the samples from Moree were identical; Figs 1 and 3), indicating a high level of genetic diversity among these samples.The evolutionary divergence between pairs of these sequences ranged from 0.0 to 2.8% (S2 Table ), which is up to four times greater than those between the Australian C. brevitarsis.
The average genetic distances between all Culicoides species sampled in Australia and those available from GenBank (S5 Table ) indicate high levels of genetic diversity between the species, with an average genetic distance of 0.153 ± 0.009 SE (range from 0.110 ± 0.014 SE (C.pallidothorax vs. C. henryi) to 0.264 ± 0.020 SE (C.humeralis vs. C. actoni)).When third codon sites were excluded (reason given below), the genetic distances between the species were much lower (average = 0.059 ± 0.007 SE; range from 0.007 ± 0.004 SE, (C.kibunensis vs. C. verbosus) to 0.172 ± 0.021 SE (C.actoni vs. C. jacobsoni)).The average genetic distances within species with at least two haplotypes ranged from 0.0 (e.g., in C. wadai) to 0.065 (in C. matsuzawai) across the trimmed 547bp region.Differentiation of Culicoides species based purely on mtDNA COI sequence divergence may be impossible (for a discussion, see Meier et al. [80]) due to over-lapping intra-and inter-specific genetic distances (i.e., the lower range between species is 0.096 (C.pallidothorax vs. C. henryi) while the maximum intra-specific genetic distance is 0.082 (e.g., C. brevitarsis) (S5 Table ), a difference of only 1.4%).
Estimates of haplotype diversity (h) and nucleotide sequence diversity (π) were both higher for the Australian endemic C. marksi (h = 0.976 ± 0.012; π = 0.017 ± 0.001) than for the Australian C. brevitarsis (h = 0.586 ± 0.172; π = 0.001 ± 0.001).We did not detect any isolation-bydistance effect between the Australian populations of C. brevitarsis based on the mitochondrial DNA COI partial gene marker, with these populations lacking significant substructure as indicated by a low F ST value (F ST = 0.026), and non-significant correlation between geographic distances and F ST /(1 -F ST ) values (R 2 = 0.0016), possibly due to the wide geographical distribution of the Cbrev-01 haplotype (Fig 1).Hierarchical population structures were not found using AMOVA-estimated fixation indices: F CT (among groups), F SC (among populations within groups) and F ST (within populations) (none of the associated P-values were significant, Table 3).The AMOVA showed that 96.6% of the variance occurs at the within-populations level, while 1.8% and 1.6% of the variance were at the among-groups level (i.e., within Australia) and among-populations within-groups level (i.e., between northern and eastern populations), respectively.

Analyses of the nuDNA markers from Australian C. brevitarsis
Two exon-primed intron-crossing (EPIC) DNA markers (i.e., RpS4 and RpS2A) were found to be monomorphic across all the populations of C. brevitarsis that we screened.We sequenced the amplicons of RpS4 and RpS2A from individuals of C. brevitarsis, confirming the primer annealing sites and the partial coding region sequences for RpS4 and RpS2A.Despite the expected presence of an intron within the sequences examined when compared against the Bombyx mori homologs, the expected introns in these two Rp gene regions were missing (sequences available on request), providing the reason for the lack of polymorphisms for these two EPIC DNA markers.This, therefore, leaves the availability of four EPIC markers (i.e., RpS2B, RpS6, RpS8, RpL27A) as well as two microsatellite DNA markers (i.e., Cimi69, Cimi85) to infer the population genetic structure of C. brevitarsis.
The results obtained from the AMOVA of the six nuDNA markers from C. brevitarsis were largely in agreement with those obtained from the mtDNA marker, with the exception that the fixation index for the among populations within groups comparisons (F SC = 0.020) was significant at both the mean and lower estimates, but not at the higher estimate (P value = 0.047 ± 0.008).The availability of results from analyses of nuDNA markers enabled a fine-scale investigation of within population variation, which showed that 93.3% of the variation was at the within individual level relative to the total population (F IT ) level (Table 3).Pairwise F ST estimates between all populations obtained using the mtDNA marker did not reveal any population substructure (Table 4) due to the presence of the Cbrev-01 haplotype in all populations.By contrast, the nuDNA markers disclosed significant signatures of limited gene flow between the northern region populations (e.g., Kalumburu B, Douglas-Daly) and four of the southern region populations (Table 5; six out of 28 comparisons).However, following a Holm-Bonferroni correction, only two pair-wise F ST estimates were significant (i.e., Douglas-Daly vs. Lismore; Douglas-Daly vs. Paterson; Table 5).Based on the F IS estimates (Table 6), no significant inbreeding at the within subpopulation level could be detected.Significant isolation-by-distance was detected using the nuDNA markers (R 2 = 0.192, P value = 0.0152), indicating that neighbouring populations tended to be genetically more similar to one another than they are to distant populations.Overall, the six nuDNA markers appeared to assort independently, and using Fisher's exact test did not disclose any departure from Hardy-Weinberg equilibrium, the only exceptions being Cimi-85 for Kalumburu and Douglas-Daly, and RpS6 and RpL27A for Paterson (due to excess of homozygotes; results not shown).

STRUCTURE analysis
Using the method of Evanno et al. [81] to calculate the number of genetically homogenous groups of individuals in a population, we found that the nuDNA markers data (Fig 4) was best explained as two genetically homogeneous sets of individuals (ΔK = 8.81).We did not apply the mtDNA data to the STRUCTURE analysis as this method is best applied to sequence data with sufficient recombination rates (i.e., its application based on non-recombination genomes such as the sex chromosomes (i.e., Y-chromosome) and genomes with predominantly uniparental inheritance mode (i.e., chloroplast or mitochondrial DNA) should be interpreted with caution [71]).The STRUCTURE result suggests the Australian C. brevitarsis population structure was best explained by unequal contributions from two genetically distinct ancestral clusters, with the 'Red' and 'Green' genetic clusters following latitudinal clines from the northwestern/northern populations to the south-eastern/southern populations, although increasing sampling sites and marker numbers will be needed to better explore this genetic signature in future studies.This analysis therefore shows that although the northern and southern C. brevitarsis populations are the same species and that gene flow has occurred between them, there is also evidence that they represent two overall genetic groups of individuals (exemplified by differences between populations in the north (e.g., Kalumburu) and populations in the south (e.g., Paterson)).This may have implications for the capacity of the BTV to spread (see Discussion).

Phylogenetic analysis
Fig 5 shows four PP-plots with results from the matched-pairs test of symmetry for the four MSAs of amino acids, first codon sites, second codon sites, and third codon sites.In terms of being suitable for phylogenetic analysis using time-reversible Markov models, the data sets are ranked (from best to worst): amino acids, second codon sites, first codon sites, and third codon sites.Thirty-one per cent of the tests led to P values below 0.05 for third codon sites (5% was expected), and the lowest observed P value was 1.33 × 10 −9 , which still is significant after a Holm-Bonferroni correction [69] to counteract the effect of multiple comparisons (P cor = 2.76 × 10 −6 ).For first codon sites, the corresponding values were 10.7% and P cor = 0.5395.These results show that it would be unwise to use third codon sites in a phylogenetic study that assumes time-reversibility.The numbers of phylogenetically informative sites were 44, 49, and 13 for the MSAs of amino acids, first codon sites, and second codon sites, respectively, implying that it would be preferable to use MSAs of first and second codon sites to infer the phylogenetic tree.In order to increase the number of phylogenetically informative sites in the MSA used to infer the phylogenetic tree, we concatenated the MSAs for these two codon sites.
Using IQTREE, we found the best model of evolution to be TN+I+G4, implying that the accumulating nucleotide substitutions was best approximated by Tamura and Nei's [57] model of nucleotide substitution, with rate-heterogeneity across sites modelled assuming a proportion of invariable sites (I) and a discrete Γ distribution with 4 rate categories (G4).However, in practise the phylogenetic analyses were conducted under the TN+I+G4 and TN+G4 models to accommodate a known difficulty in estimating both I and Γ accurately (see page 114 in [82]).
Fig 6 presents the ML tree inferred from the concatenated alignment of first and second codon sites.The tree divides the Australian samples into two distantly related groups.The first group includes the Australian C. brevitarsis, which consistently (97%) formed a sister group to the Japanese C. brevitarsis.The second group includes C. marksi, C. henryi, C. pallidothorax, C. bundyensis, and C. bunrooensis, which are endemic Australian species.However, this group may also include two Japanese species (C.verbosus and C. kibunensis).Whether this is the case was impossible to determine from the data (the low bootstrap values for some of the edges and the presence of trifurcations in the tree indicate that the number of informative sites is too low to cast light on this issue).Notwithstanding this shortness of data, it is worth mentioning the evolutionary distance between Australian and Japanese samples of C. brevitarsis is as large as those between some of the other Australian endemic species (i.e., C. marksi, C. henryi, and C. pallidothorax).Looking at the phylogenetic tree more broadly, it is characterised by many low bootstrap values and large variation in the root-to-tip distances.Taken together, these features are consistent with the number of phylogenetically informative sites being too small, implying that much longer sequences are needed to accurately resolve the phylogenetic relationships among these species.
With respect to the phylogenetic relationship among competent and potential vectors of BTV in Australia, our tree suggests that the five endemic species considered here (i.e., C. marksi, C. henryi, C. pallidothorax, C. bundyensis, and C. bunrooensis) do not share a close common ancestor with BTV-competent vectors (i.e., C. brevitarsis, C. wadai, C. fulvus, and C. actoni) (note that we did not have access to Australian samples of C. fulvus, C. wadai, and C. actoni and, therefore, were unable to determine whether Japanese samples of C wadai and C. actoni are the closest relatives of, respectively, the Australian samples of C wadai and C. actoni).Although this does not guarantee that these endemic species cannot become vectors of BTV, it does imply that they are so evolutionarily different from known vectors of BTV in Australia, that they might not be potential vectors of BTV.To better address this issue, it will be necessary to consider much longer sequences (e.g., preferentially the whole mitochondrial genome; multi-loci sequence analysis) and more samples and species from, in particular, Australia.Maximum Likelihood phylogenetic tree including bootstrap values of various Culicoides species from this study.Maximum Likelihood tree, with bootstrap values for some of the internal edges inferred using the UFBoot method.The tree was inferred under the TN+I+G4 model of evolution.Samples are marked using their species name, geographical origin (AU: Australia; JAP: Japan), haplotype name (when available, listed in brackets), and a number (whenever needed, to distinguish some of the sequences).Australian Culicoides are within blue boxes, and known BTV competent Asian/South East Asian Culicoides hosts are in red boxes.The scale bar measures evolutionary distance in terms of average nucleotide substitutions per site.The tree was rooted using four species of Anopheles.The ML tree inferred under the TN+G4 model of evolution differed from this one at seven internal edges, but the overall conclusion remained the same (see text).doi:10.1371/journal.pone.0146699.g006
A comparison of Australian, Japanese, Chinese, Solomon Islands, and Timor-Leste C. brevitarsis mtDNA COI sequences, available through the studies of [61] and [51], showed that the Japanese samples, and to a lesser extent, the Chinese samples, are distantly related to the Australian, Solomon Islands, and Timor-Leste samples (S1-S3, S5 Tables).On the other hand, one of the two haplotypes from Timor-Leste [KJ162968] was identical across at least 646bp to an Australian [KJ162974] and all four Solomon Islands individuals [51].This haplotype [KJ162968] was also identical to our Cbrev-01 and Cbrev-05 haplotypes between positions 324 and 646, and is thus consistent with the expected introduction scenario for this haplotype (i.e., Cbrev-01), with an incursion route from Timor-Leste to the northwest of Australia, based on the prevailing weather/wind patterns.In order to differentiate the most common haplotype previously found [51], the authors would need to have sequenced further along the COI gene, as was done by us for Cbrev-01 and Cbrev-05, which identified a single base pair difference at position 1144 (S4 Table ).

Culicoides brevitarsis mtDNA COI haplotypes
Our analysis of the mtDNA marker show that the C. brevitarsis samples from northern and southern Australian sites (i.e., the full geographical range of C. brevitarsis in Australia) is a single species with low nucleotide sequence and haplotype diversities across >800bp of this genome.There is no support for the notion that C. brevitarsis in Australia is a set of cryptic species.Culicoides brevitarsis has been recorded in Australia for over 100 years and is believed to be an introduced species [83] of Asian or South East Asian origin [39].The spread of C. brevitarsis to its current southern limit is likely to be linked to European settlements, and its presence in northern Australia may also be associated with the introduction of ruminants [84] such as water buffalo (Bubalus bubalis) since around 1825<https://www.environment.gov.au/biodiversity/invasive-species/publications/factsheet-feral-water-buffalo-bubalus-bubalis>.The inclusion of additional samples from South East Asia (e.g., Papua New Guinea, Timor-Leste, Indonesia, Malaysia, the Malay Archipelago), Micronesia, Melanesia, and Asia (e.g., India) would help to clarify what are the most likely origins of the Australian C. brevitarsis populations.However, the difficulty in obtaining good samples from these regions probably represents the single most challenging factor to future similar studies.
The mtDNA COI gene has been widely used to differentiate insect species (e.g., [1,13,85]).Differentiating dipteran species based on the mtDNA COI gene generally has had a low success rate due to overlapping ranges of genetic distances during intraspecific and interspecific comparisons [80].Differentiating Culicoides species using genetic distances of the mtDNA COI gene may therefore also be of limited value due to overlapping intra-and inter-specific genetic distances (e.g., the lower range between C. henryi and C. pallidothorax is 0.096 while the maximum intra-specific genetic distance between individuals of C. brevitarsis is 0.082 (S5 Table ); a difference of only 1.4%).To better understand the genetic relationship between C. pallidothorax and C. henryi, ecological, behavioural, and geographical information will be needed in addition to morphological and sequence data (e.g.integrative taxonomy, see [86,87]).A high intraspecific distance (> 0.110) was also reported for C. obsoletus and C. newsteadi, while the mtDNA barcode method failed to identify two closely related pairs of species as four separate species (i.e., C. festivipennis and C. clastrieri (0.001); C salinarius and C. manchuriensis (0.017)) due to low interspecific distances [35].

Culicoides marksi mtDNA COI haplotypes
Analysis of 16 endemic C. marksi individuals from two populations detected greater haplotype diversity compared with all C. brevitarsis samples combined, and therefore adds support to the introduced species status of C. brevitarsis.Culicoides marksi is an important vector of many orbiviruses (e.g., Tables 1 and 2 in [88]).The species feeds readily on marsupials, and is distributed across the whole of Australia and into Papua New Guinea.A rare wing pattern variant of C. marksi has been found in the Northern Territory [27].This, and the high level of genetic variation present in C. marksi populations, suggests that a future study should consider examining the likely presence of cryptic species within the currently recognised C. marksi species, as this might have been an important factor that contributed to previous unsuccessful artificial BTV infection experiments [88].The high level of mtDNA haplotype diversity in C. marksi contrasted with that found in C. brevitarsis.Factors contributing to these differences between the two species are likely to include species-specific differences in flight patterns, dispersal ability, host-assisted or host-mediated dispersal, and preferred environmental/weather conditions and breeding sites (e.g., [41,[89][90][91]).

Culicoides brevitarsis gene flow patterns inferred from nuDNA markers
Although the mtDNA genome is well suited for studying maternal gene flow patterns, inference of paternal gene flow patterns requires the inclusion of nuDNA markers (e.g., microsatellite and EPIC DNA markers).In this regard, EPIC DNA markers have not been used as widely as microsatellite DNA markers (e.g., [52]), but they are still highly suitable for inferring population genetic structures (e.g., [19]), especially in taxa with transposable elements (TEs) in their genomes, such as the Diptera (e.g., [53] and references therein).Here we demonstrated the usefulness of EPIC DNA markers, in addition to microsatellite DNA markers, in a study of gene flow patterns in Australian C. brevitarsis.EPIC and microsatellite DNA markers identified a population structure and gene flow patterns that are consistent with those detected using the mtDNA marker, with additional power to detect isolation by distance and genetic clusters in a species with limited polymorphism in its mtDNA.The discovery of isolation by distance using nuDNA markers but not mtDNA markers may imply sex-specific differences in dispersal patterns, feeding and/or mating behaviours in C. brevitarsis, although confirmation of this will require further work.Our STRUCTURE analysis of the nuDNA markers revealed the presence of a northern cluster and a south cluster and also supported the discovery of isolation by distance, with north-western populations (e.g., Kalumburu) and south-eastern populations (e.g., Paterson) being genetically less similar to each other that they are to their neighbouring populations.Two potential genetic clusters of C. brevitarsis from Australia and its neighbouring regions were also identified based on multiple microsatellite DNA markers by Onyango et al. [92], although this study would have benefited also from the inclusion of the maternally inherited mtDNA marker, for inference of genetic structure and incursion pathways from the maternal lineage perspective, given that it is the female species that is the main arbovirus vector.

Phylogeny of Culicoides
The alignment of 182 codons encoding some of the mtDNA CO I gene from Culicoides was characterised by compositional heterogeneity across the sequences at third codon sites, but not at first and second codon sites, implying that the mitochondrial genomes of many of the 25 species examined here must have evolved under non-stationary, reversible, and homogeneous (SRH) conditions, and that natural selection must have been sufficiently strong to hide evidence of this at the non-synonymous codon sites.This is consistent with reports on the effects of directional mutation pressure on non-synonymous codon sites of protein-coding genes in animal mitochondria [93].Since compositional heterogeneity across sequences may bias phylogenetic estimates [94] and most model-based molecular phylogenetic methods assume that the data have evolved under globally SRH conditions, we decided to use first and second codon sites in our phylogenetic analysis of Culicoides, even though the number of phylogenetically informative sites was relatively small for these two codon sites.This type of consideration is not as uncommon as it used to be (for examples, see [95,96] and papers cited therein).
The inferred phylogeny of the mtDNA COI marker from our Culicoides samples was characterised by large variation in the root-to-tip distances and bootstrap values ranging from 11% to 100%, which is consistent with the data containing too few phylogenetically informative sites.Therefore, it was difficult to have much confidence in the inferred phylogenetic tree (Fig 6).However, assuming the tree is a good representation of the underlying evolutionary history of Culicoides, it would appear that the Australian species are divided into two groups, with C. brevitarsis being more closely related to species found in Japan (e.g., C. jacobsoni, C. brevipalpis, C. maculatus, C. peregrinus, and C. actoni) than species found in Australia, and with the remaining Australian species (C.marksi, C. henryi, C. pallidothorax, C. bundyensis, and C. bunrooensis) forming a closely related group of species.

Biosecurity implications of C. brevitarsis northern and eastern mtDNA COI haplotypes
Dyce [39] suggested that Asian or South East Asian species such as C. brevitarsis, C. oxystoma and C. peregrinus may have arrived in Australia after early European settlement and the establishment of domesticated herbivores in the tropical north, with the proposed path to colonisation of C. brevitarsis being from Timor-Leste into the northern regions of WA, NT, and Queensland (QLD) prior to splitting into (i) a northern route (to colonise northern QLD and Papua New Guinea, and across to the Melanesian islands (e.g., Solomon Islands, Fiji, and maybe New Caledonia)), and (ii) a southern route (to colonise central QLD, northern and central NSW, and Lord Howe Island) (see Fig 2 of [97]).Using simulation studies, Eagles et al. [90] found that if the potential source of the Australian population were Timor-Leste, then, given the current climatic conditions and weather patterns, the Australian regions with the greatest risk of windborne incursion would be the top end of NT and the Kimberley region of WA.Our analysis of Culicoides mtDNA haplotypes from Australia, Timor-Leste, and the Solomon Islands is consistent with the conclusions of Eagles et al. [90].
The detected distributional patterns of C. brevitarsis mtDNA haplotypes and nuclear genetic clusters imply that multiple independent incursions may have occurred into Australia after the arrival of suitable domesticated cattle hosts (i.e., since European settlement began in Australia).Similarly, Bellis et al. [98] also identified evidence to support multiple incursion pathways into Australia from surrounding regions from mtDNA marker analysis.Based on our comparisons of mtDNA haplotypes from neighbouring regions and throughout Australia, four hypotheses can be put forward to assist future in-depth studies of C. brevitarsis incursion pathways and population spread: )) suggests other possible incursion events, with members of likely source populations again being carried by trade winds from, for example, Timor-Leste or Indonesia [90]; and (iv) based on mtDNA haplotypes not present in north-western/northern Australia, at least one additional independent incursion event must have occurred, possibly from New Guinea (i.e., similar to that proposed for C. dumdum) [99] into northern QLD, which subsequently spread to our eastern sampling sites (Paterson, Grafton and Lismore).
The small number of specimens analysed from Timor-Leste and the Solomon Islands may explain the limited number of haplotypes detected in these regions.Increasing sample sizes from, for example, Timor-Leste, Papua New Guinea, Indonesia, and Solomon Islands will enable better testing of these hypotheses especially hypothesis (iv) listed above.Adding support to the proposed hypothesis of multi-incursion points into Australia of C. brevitarsis is the identification of other region-specific mtDNA haplotypes (i.e., Cbrev-04 and Cbrev-06 in the north and Cbrev-05, Cbrev-07, Cbrev-08, and Cbrev-09 in the east), although further confirmation is needed and will require extensive sampling and population genetic analysis of samples from especially northern QLD, which were not available for this study.Molecular analysis of the BTV serotypes (see [84] for a review) supports Dyce's [97] hypothesis of a northwest-southeast spread, with an origin in the Indian subcontinent, although multiple potential Culicoides incursion origins (e.g., from west (e.g., Lombok) to east (Fly (Western Province, PNG) into northern Australia (WA, NT, QLD) have been reported [90,100].Furthermore, differences in BTV virulence patterns (Table 3 in [84]) also lend support to the proposed multiple entry points of C. brevitarsis based on the presence of disjoint northern and eastern populations, as is evident from the region-specific mtDNA haplotypes, and may be reflected by geographic clines of genetic clusters based on nuclear DNA markers.

Conclusions
Our results showed that C. brevitarsis, with a relatively well-established range across Australia and its neighbouring countries such as Indonesia, Timor-Leste, Papua New Guinea, and Solomon Islands, may have been introduced several times into Australia, as indicated by the heterogeneous patterns of geographical distribution of the mtDNA haplotypes in northern and eastern populations.The results highlight a need for further in-depth studies of incursion pathways (e.g., significantly increase sampling sites, marker numbers, and sample over multiple seasons) of C. brevitarsis and other vectors of viruses in Australia and surrounding regions.

Fig 5 .
Fig 5. PP-plots for P values distributions from matched-pairs tests of symmetry for mtDNA COI sequence data sets.PP-plots with the distributions of P values from the matched-pairs tests of symmetry for the following data sets: amino acids, first codon sites, second codon sites, and third codon sites.A J-shaped distribution of dots, with more than 5% of the dots below 0.05 (i.e., the horizontal line) indicates violation of the assumption of evolution under globally SRH conditions.The PP-plot for third codon sites represents a clear example of violation of this phylogenetic assumption.doi:10.1371/journal.pone.0146699.g005

Fig 6 .
Fig 6.Maximum Likelihood phylogenetic tree including bootstrap values of various Culicoides species from this study.Maximum Likelihood tree, with bootstrap values for some of the internal edges inferred using the UFBoot method.The tree was inferred under the TN+I+G4 model of evolution.Samples are marked using their species name, geographical origin (AU: Australia; JAP: Japan), haplotype name (when available, listed in brackets), and a number (whenever needed, to distinguish some of the sequences).Australian Culicoides are within blue boxes, and known BTV competent Asian/South East Asian Culicoides hosts are in red boxes.The scale bar measures evolutionary distance in terms of average nucleotide substitutions per site.The tree was rooted using four species of Anopheles.The ML tree inferred under the TN+G4 model of evolution differed from this one at seven internal edges, but the overall conclusion remained the same (see text).
(i) the presence of a single mtDNA haplotype (Cbrev-01) in Timor-Leste and across the distributional range of C. brevitarsis in Australia potentially indicates that populations in Timor-Leste may have been the source of Australian populations, and that members of these populations may have been carried by trade winds to northern Australia/north-western Australia before founding the first Australian populations of C. brevitarsis; (ii) a relatively long history of Cbrev-01 in northern Australia have enabled the spread of this haplotype to the southern-most Australian limit of this species' habitable range (Fig 1); (iii) the presence of other mtDNA haplotypes (e.g., Cbrev-03 in Kalumburu (in 2008 and 2009) and Katherine (in 2009

Table 1 .
Culicoides samples used in this study including known virus vectored and taxonomic status.

Table 2 .
Primers for mtDNA COI regions and exon-primed intron-crossing (EPIC) primers for ribosomal protein nuclear genes used to infer population structure of Culicoides species.

Table 3 .
AMOVA results of Australian Culicoides brevitarsis populations based on mtDNA and nuDNA markers.
Results from Analysis of Molecular Variance (AMOVA) of Culicoides brevitarsis populations in Australia based on (a) mtDNA COI partial gene sequence 829bp (502bp in parentheses for comparisons with C. marksi), and (b) four EPIC and two microsatellite DNA loci.Note: Fixation indices based on mtDNA COI partial sequence for: among groups (F CT ) are northern Australian (Kalumburu, Douglas-Daly, and Katherine) and eastern Australian sites (Lismore, Grafton, Bellingen, and Paterson); among populations within groups (F SC ), and within populations (F ST ).All P values were non-significant.Fixation indices estimated from EPIC-PCR and microsatellite DNA markers were also non-significant for among groups (F CT ), for among individuals within populations (F IS , i.e., non-random mating among individuals within subpopulations), for F IT (i.e., average reduction in heterozygosity of an individual relative to the total population, and represents contributions due to non-random mating within demes (subpopulations) and effects of random genetic drift among demes (F ST )), but significant for among populations within groups (F SC ; P-value = 0.047 ± 0.008), indicating substantial variance existed at this population structure hierarchical level.doi:10.1371/journal.pone.0146699.t003

Table 4 .
Australian Culicoides brevitarsis population mtDNA COI pairwise F ST estimates and associated P values ± SE.
Note: Australian Culicoides brevitarsis population mtDNA COI pairwise F ST estimates (lower triangle) and associated P values ± SE (upper triangle) based on 829bp of mtDNA COI partial gene sequence.Intermediate level of population substructure is detected between Grafton and Douglas-Daly (F ST = 0.122) and between Grafton and Paterson (F ST = 0.117) but with non-significant P values and SD from 500 bootstrap replications.doi:10.1371/journal.pone.0146699.t004

Table 5 .
Australian Culicoides brevitarsis population pairwise F ST estimates and associated P-values inferred from nuDNA markers.ST estimates (lower triangle) and corresponding P values (upper triangle) between Culicoides brevitarsis populations as inferred based on genotype data from four EPIC and two microsatellite DNA markers.Significant F ST P values (< 0.05) are in red based on 110 permutations as calculated using Arlequine V3.11.Kalumburu A and Kalumburu B are two Western Australian populations collected from the same sites but from different years ('A' 29/12/08; 'B' 25/6/09).Katherine and Douglas-Daly are from the Northern Territory, and Bellingen, Lismore, Paterson and Grafton are New South Wales populations.Note: Only F ST estimates for Douglas-Daly/Lismore and Douglas-Daly/Paterson remained significant following a Holm-Bonferroni correction at the α = 0.05 level (bolded).Treating Kalumburu A & B as a single population (due to the insignificant F ST value between these two samples) indicated the F ST estimate between Kalumburu and Lismore was not significant (F ST = 0.029, P = 0.135±0.023).

Table 6 .
Nuclear DNA marker statistics for Australian C. brevitarsis populations.
e = expected heterozygosity, P (Random F IS observed F IS ).Markers were overall in Hardy Weinberg equilibrium for all populations with non-significant P values.All F IS estimates for all populations were also not significant, although Kalumburu (A & B) F IS estimates were high, but low (F IS : 0.087, P = 0.252) when both populations combined as one.doi:10.1371/journal.pone.0146699.t006