Molecular Phylogeny and Historical Biogeography of the Neotropical Swarm-Founding Social Wasp Genus Synoeca (Hymenoptera: Vespidae)

The Neotropical Region harbors high biodiversity and many studies on mammals, reptiles, amphibians and avifauna have investigated the causes for this pattern. However, there is a paucity of such studies that focus on Neotropical insect groups. Synoeca de Saussure, 1852 is a Neotropical swarm-founding social wasp genus with five described species that is broadly and conspicuously distributed throughout the Neotropics. Here, we infer the phylogenetic relationships, diversification times, and historical biogeography of Synoeca species. We also investigate samples of the disjoint populations of S. septentrionalis that occur in both northwestern parts of South America through Central American and the Brazilian Atlantic rainforests. Our results showed that the interspecific relationships for Synoeca could be described as follows: (S. chalibea + S. virginea) + (S. cyanea + (S. septentrionalis/S. surinama)). Notably, samples of S. septentrionalis and S. surinama collected in the Atlantic Forest were interrelated and may be the result of incomplete lineage sorting and/or mitochondrial introgression among them. Our Bayesian divergence dating analysis revealed recent Plio-Pleistocene diversification in Synoeca. Moreover, our biogeographical analysis suggested an Amazonian origin of Synoeca, with three main dispersal events subsequently occurring during the Plio-Pleistocene.


Introduction
The Neotropical Region is one of the most important biodiversity hotspots in the world, harboring more than half of the Earth's remaining rainforests [1]. Recent studies documenting the diversification history of the Neotropical biota have elucidated both temporal and spatial biogeographic patterns, resulting in proposals that diversification events in the South American biota have been driven by the uplift of the Andes, marine incursions, or Pleistocene climate changes [2][3][4][5]. In addition, studies involving mammals and birds have allowed the inference of historical connections between the morphoclimatic domains of the South America Amazonian rainforest (AM) and the Atlantic Forest (AF) (e.g., [6][7]). Currently these two forests are separated by a dry corridor of open vegetation [8], also called the 'Dry Diagonal', that encompasses the Argentinean and Paraguayan Chaco, the Caatinga in northeastern Brazil, and the central Brazilian Cerrado [6,8].
Several previous studies have sought to understand the phylogenetic relationships among Synoeca species [20][21][22]. These analyses used only morphological characters and supported the monophyly of the genus, with the following relationships among the species: S. chalibea + (S. virginea + (S. septentrionalis + (S. surinama + S. cyanea))) [20,22]. However, species of Synoeca exhibit wide morphological variation, as indicated by Cely and Sarmiento [21] and the presence of multiple synonyms for species [9]. Moreover, the recent record of colonies of S. septentrionalis in the AF [19] raises questions about the taxonomy and evolutionary history of the group.
Here we infer the first molecular phylogeny for Synoeca species using mitochondrial and nuclear DNA sequences. We also provide new observations of morphological structures that can help diagnose some species. Furthermore, we investigate the status of the disjoint populations of S. septentrionalis. Bayesian divergence dating analysis calibrated with an a priori mutation rate of insect mitochondrial DNA was used to estimate a timescale for diversification events in the genus. We also provide a Bayesian analysis of historical biogeography that suggests an Amazonian origin for Synoeca.

Taxon sampling
Our sampling included specimens from multiple localities within known species distributions. Table 1 and Fig. 1 show the collection sites of specimens used in this study. The map was generated using the software Quantum-GIS v1.8.0 (Open Source Geospatial Foundation Project, Beaverton, OR, USA). All specimens were preserved in ethanol prior to the molecular analyses and vouchers are deposited in the entomological collections at the Universidade Estadual de Santa Cruz, Ilhéus, Bahia, Brazil and the National Museum of Natural History (USNM, Washington,

Species identification and morphological observations
Species identification was based on the keys provided by Richards [9], Andena et al. [20], and Cely and Sarmiento [21]. We observed additional external morphological structures which helped in identifying specimens, as follows: punctation on the head; erect setae on the scape; and erect setae on the scutum (Fig. 2). Images of specimens were generated using a JVC KY-F75U digital camera mounted on a Leica Z16 APO steromicroscope. All images were edited using Photoshop CS4 (Version 11.0) (Adobe Inc.).  Table 1. doi:10.1371/journal.pone.0119151.g001

DNA extraction, amplification and sequencing
The mesosoma and/or hind legs were removed from each specimen and DNA was extracted using the phenol-chloroform method following the Han and McPheron protocol [23]. We amplified three mitochondrial (16S, cytochrome b, and cytochrome c oxidase I) and one nuclear (wingless) gene fragments by PCR using specific primers and amplification conditions (S1 Table). PCR products were purified using exonuclease I and shrimp alkaline phosphatase and directly sequenced in an ABI Prism 3730 (Applied Biosystems) sequencer (Laboratório de Biotecnologia da FCAV-UNESP de Jaboticabal, SP). PCR products were sequenced in both directions and sequence contigs were assembled using Sequencher 5.1 (Gene Code Corp., Ann Arbor, MI, USA). DNA sequences were aligned using Muscle 3.7 [24] (with default parameters) in MEGA 5.10 [25], with each of the four genes aligned separately. All sequences are deposited in GenBank and accession numbers are listed in Table 1.

Phylogenetic analyses
We included DNA data from 26 Synoeca samples with each of the five Synoeca species represented by multiple collection localities. We used species from four other genera as outgroups.
Most phylogenetic analyses were based on a concatenated data matrix (1829 base pairs) of the four gene fragments. The most appropriate model of nucleotide evolution and the best-fitting partitioning scheme were selected using PartitionFinder v1.1.1 [26] under the Bayesian information criterion (BIC) ( Table 2). Phylogenetic inference was conducted by Bayesian inference (BI) performed using MrBayes v3.2.2 [27] and maximum likelihood (ML) performed using GARLI v2.0 [28]. BI was also performed on each of the three mitochondrial gene fragments (wingless contained little phylogenetic information) separately to examine potential conflicts in phylogenetic signal among genes. All BI analyses consisted of two independent runs of 50 million generations each with four chains (temp = 0.1) and sampled every 1000 generations. The burn-in, convergence, and stationarity were assessed using Tracer v1.5 [29]. We removed the first 20% of sampled generations and combined the remaining generations to produce the maximum credibility tree. We conducted 1000 ML bootstrap replicates in GARLI under the same partitions and nucleotide models as in BI. Trees from all analyses were visualized using FigTree v1.4.0 program [30].
We conducted an additional analysis in which we combined the information from all gene trees into a single tree (species tree), since data from multiple genes and multiple individuals per species can be useful for resolving species trees [31][32]. We used Ã BEAST v1.8.0 [33] to infer species tree. The recognized five species and S. septentrionalis samples from AF were assigned as ''species" in the analysis. The Ã BEAST run consisted of 50 million generations, a Yule process for the species tree prior, a piecewise linear and constant root model for population size, randomly generated starting trees for each gene, and a burn-in of 20%.

Divergence time estimation
We inferred divergence times under a Bayesian framework using BEAST v.1.8.0 [34]. We generated the input file in BEAUTi using the two mitochondrial protein-coding genes (COI + CytB) and the substitution model (GTR + Γ) as selected by PartitionFinder. Only nucleotide data from ten specimens were included in these analyses in order to avoid missing data. We employed an uncorrelated lognormal relaxed clock model [35]. Clock models were unlinked, and substitution and tree models were linked among partitions. A Yule speciation process with a random starting tree was used for the tree prior. Given the poorly known fossil record for social wasps in general, with none belonging to Synoeca [36], we applied the Brower [37] mutation rate of mitochondrial genes (under a normal distributed prior). This mutation rate estimated at 2.3% My -1 was based on a set of seven studies that provided age estimates of lineage splits ranging from 300 to 3,250,000 years ago. Two independent Markov chain Monte Carlo (MCMC) searches were conducted with 100 million generations each, with parameters sampled every 10,000 steps and a burn-in of 20%. We checked for convergence between runs and analysis performance with Tracer v1.5 using effective sample size (ESS) scores. The resulting trees were combined using TreeAnnotator v1.8.0 and the consensus tree with the divergence times was visualized in FigTree v1.4.0.

Historical biogeographic analysis
We performed the Bayesian Binary MCMC (BBM) method of biogeographical and ancestral state reconstruction implemented in RASP (Reconstruct Ancestral State in Phylogenies) 2.1b [38]. We used the tree obtained from Ã BEAST (species tree) and published occurrence data for the analyzed species [9] as input files for RASP. Thus, we assigned species distribution areas to geographical regions as follows: (A) Amazonian forest, (B) Middle America, (C) Atlantic forest, (D) Dry Diagonal (Cerrado, Chaco and Caatinga). The BBM analysis was run applying the model F81 + Γ and no outgroup was defined. We ran the analysis for 5 millions generations, sampled every 1000 generations with the first 1000 samples being discarded as burn-in.

Morphological observations
We observed the presence of much punctation on the head of Synoeca septentrionalis and S. chalibea, in contrast to S. virginea, S. cyanea, and S. surinama, which showed little or no punctation on the head (Fig. 2a, b). Furthermore, S. septentrionalis have many erect setae on the scape (Fig. 2d), as opposed to S. virginea, S. chalibea, and S. cyanea, which do not present erect setae (Fig. 2c); S. surinama have few erect hairs on the scape. Another noteworthy morphological character is the presence of erect setae on the scutum of S. septentrionalis, S. cyanea, and S. chalibea that are absent from S. surinama and S. virginea (Fig. 2e, f). Moreover, the two S. chalibea samples studied here showed differences in body color, one form with a dark body ("dark form") and the other form with the body entirely yellowish ("yellow form").   (Fig. 3). BI performed on each of the three mitochondrial gene fragments all show divergence into the two major clades in Synoeca and consistently place S. surinama into the three groups discussed above, but there are some potential conflicts regarding the placement of some S. septentrionalis specimens among the three gene fragments (S1 Fig.). The species tree generated by Ã BEAST was congruent with the concatenated BI and ML results with higher node support (PP: 1) for all ingroup clades (Fig. 4).

Divergence times and historical biogeography
The divergence dating analysis revealed a middle/late Miocene origin for Synoeca with subsequent diversification of extant species occurring in the Plio-Pleistocene (Fig. 5). The oldest divergence event was the split between the two major clades: (S. chalibea + S. virginea) and . The youngest diversification event was the separation of specimens of S. surinama and S. septentrionalis in the northern clade at 0.14 mya (95% of HPD: 0.02-0.31 mya). The biogeographic ancestral area analysis supports an Amazonian origin for Synoeca, with three independent dispersalvicariance events from Amazonia to the Dry Diagonal and AF, one secondarily from Amazonia to Central America, and one from Amazonia to the AF (Fig. 6).

Synoeca systematics
Despite the fact that Synoeca has only five species described, often some specimens are misidentified in collections. This problem was pointed out by Richards [9], for example concerning the misidentification of specimens of S. chalibea and S. virginea, which requires careful comparison. Indeed, the "yellow form" of S. chalibea is quite similar to S. virginea, but can be differentiated by the presence of punctation on the scutum and pronotum [20], as well as on the head. Also, specimens of S. septentrionalis from the AF show variation in the dark triangular area in the clypeus as verified by Menezes et al. [19], a major character used in the diagnosis of this species according to Cely and Sarmiento [21]. In some specimens, this dark area is totally absent, which may cause this species to be misidentified as S. cyanea, which also has a reddish clypeus. Considering this, specimens from AF identified as S. cyanea in collections need to be carefully verified. Despite the morphological similarities between S. septentrionalis AF and S. cyanea, the new morphological characters described here can be useful for correct species identification.
The division of Synoeca into two major clades is in disagreement with previous morphological phylogenetic analyses [20,22]. However, morphological characters, such as punctation on the propodeum, clypeal-eye contact, malar space and wing color, might separate Synoeca into two groups (see [9,20]) as shown here: (S. chalibea + S. virginea) + (S. cyanea + (S. surinama/ S. septentrionalis)). Moreover, the previous placement of S. septentrionalis as sister to the clade (S. surinama + S. cyanea) is not supported by our molecular results. This clade in the morphological analyses is supported only by the absence of numerous erect outstanding setae on the first metasomal tergum and sternum (see [20]). However, Andena et al. [20] commented that this condition is homoplastic because it is found in several outgroups. Thus, this character may not be sufficient to establish relationships between the species within the clade S. cyanea + S. surinama/S. septentrionalis.

Historical biogeography
Our results also shed light on biogeographic patterns within Synoeca. Three Synoeca species occur in the AM (S. chalibea, S. virginea and S. surinama), S. septentrionalis occurs in the MA and AF, and only one species is restricted to the ESA, namely S. cyanea. Also, most of the species of the tribe to which Synoeca belongs (Epiponini) occurs in or is restricted to the AM [9]. Moreover, our analysis of ancestral area reconstruction supports an Amazonian origin of Synoeca (Fig. 6). Thus, the genus may have experienced three main colonization events from Amazonia during the Plio-Pleistocene.
The oldest inferred route probably occurred in southern South America between the AM and the AF. There is much evidence to support contact in the past between the AF and the AM [6,7,[39][40][41]. Batalha-Filho et al. [7] combined phylogenetic and distributional data of avifauna and suggested old connections (middle to late Miocene) between AM and AF through the current southern Cerrado and Pantanal and the transition towards the Chaco and palm savannas of Bolivia and Paraguay. Also, it has been suggested that taxa representing divergences through this connection in the AF are spatially restricted to the southern AF and upland forests in southern Bahia, Minas Gerais, Espírito Santo, Rio de Janeiro and São Paulo [7]. Thus, the current geographical distribution of S. cyanea and its time of diversification at 3.46 mya (95% of HPD: 4.79-2.4 mya; Fig. 5) are consistent with this species colonizing by this route in southern South America.
A second route probably occurred relatively recently in direction to the Central America Rainforest region and southern North America via the Isthmus of Panama. The formation of Isthmus of Panama during the Pliocene at~3.5 mya led to the Great American Biotic Interchange [2,42]. The current distribution and diversification time of S. septentrionalis MA at 2.29 mya (95% of HPD: 3.15-1.56 mya; Fig. 5) agrees with this colonization route. Moreover, the lower species richness of Epiponini in the southern North America and Central America Rainforest regions compared to South America may be explained by recent dispersal via the Isthmus of Panama and, therefore, less time for species diversification.
The third route appears to have recently occurred in northeastern Brazil between AM and AF. Batalha-Filho et al. [7] suggest two connection pathways between AM and AF in northeastern Brazil, one through the coastal zones of Maranhão, Piauí, Ceará, and Rio Grande do Norte (Brazil), and another through Tocantins and Bahia (Brazil). Populations of S. surinama may have reached the AF by one of these routes and, after the breaking of this connection, experiencing consequent divergence between populations of AM/C and AF. Genetic differences among populations in both AM and AF are also found in the swarm-founding social wasp Angiopolybia pallens [43].

Atlantic Forest and S. septentrionalis
Despite the morphological similarity seen in S. septentrionalis specimens living in different regions, our molecular data suggest that S. septentrionalis and S. surinama specimens from the Atlantic Forest (AF) are interrelated and belong to a distinct lineage of Synoeca. Moreover, there appears to be a very recent division [at 0.38 mya (95% of HPD: 0.67-0.16 mya; Fig. 5)] between northern and central Atlantic forest groups (NAF and CAF). This division may be explained by the occurrence of refuges during the Pleistocene in the northern and central AF seen by Carnaval et al. [44] for amphibians. However, our molecular results may also be due to incomplete lineage sorting and/or mitochondrial introgression between S. septentrionalis AF and S. surinama AF. Nevertheless, there is insufficient morphological information to separate S. septentrionalis AF and S. septentrionalis MA and we suggest that they may represent two species with a very similar morphology (i.e., potential cryptic species). Further studies at the population level will be useful in characterizing the diversification processes occurring between members of S. surinama and S. septentrionalis.
Our molecular phylogenetic findings suggest that Synoeca species richness in the Neotropical Region may be underestimated due to morphological similarity and lack of broad geographical sampling. Further studies combining morphology, genetics, and population-level sampling should be seen as the main challenge in the future of phylogenetic research in social wasps as a whole. This work may result in the future recognition of additional species of social wasps in the Neotropics.  Table. Target gene fragments, primer sequences, and origin of the primers used in this study.