Phylogeny of the Archiborborinae (Diptera: Sphaeroceridae) Based on Combined Morphological and Molecular Analysis

The Archiborborinae is a diverse Neotropical subfamily of Sphaeroceridae, with many undescribed species. The existing generic classification includes three genera consisting of brachypterous species, with all other species placed in the genus Archiborborus. We present the first phylogenetic hypothesis for the subfamily based on morphological, molecular, and combined datasets. Morphological data include 53 characters and cover all valid described taxa (33 species in 4 genera) in the subfamily, as well as 83 undescribed species. Molecular data for five genes (mitochondrial 12S rDNA, cytochrome c oxidase subunit I, and cytochrome B, and nuclear alanyl-tRNA synthetase and 28S rDNA) were obtained for 21 ingroup taxa. Data support the separation of the Archiborborinae from the Copromyzinae, with which they were formerly combined. Analyses support consistent groups within the subfamily, but relationships between groups are poorly resolved. The validity of the brachypterous genera Penola Richards and Frutillaria Richards is supported. The former genus Archiborborus Duda is paraphyletic, and will be divided into monophyletic genera on the basis of this work. Aptery and brachyptery have evolved multiple times in the subfamily. Antrops Enderlein, previously including a single brachypterous species, is a senior synonym of Archiborborus.


Introduction
The Archiborborinae is an entirely Neotropical clade of Sphaeroceridae, first recognized as a subfamily by Kits and Marshall [1]. The most recent classification of the subfamily [2] (as the tribe Archiborborini in the subfamily Copromyzinae) includes four genera: Antrops Enderlein (type: Antrops truncipennis Enderlein), Archiborborus Duda (type: Archiborborus submaculatus Duda ( = Archiborborus femoralis (Blanchard)), Penola Richards (type: Penola eudyptidis Richards), and Frutillaria Richards (type: Frutillaria kuscheli Richards). All except Archiborborus include only flightless species with highly reduced wings. The subfamily is speciose but poorly known; we recently described seven new species in the genus Frutillaria [1] and we will be describing approximately 80 new species of Archiborborinae in upcoming papers. Phylogenetic analysis is required to resolve two taxonomic problems relating to the subfamily: the relationships of the Archiborborinae to other Sphaeroceridae, and the generic classification of the subfamily.
The Archiborborinae were first treated as a group by Hackman [3] although Richards [4,5] earlier acknowledged the relationship between his new genera Penola and Frutillaria and the genus Archiborborus. The group was treated as a tribe, Archiborborini, by Norrbom and Kim [6], who considered the included genera to form a clade sister to the Holarctic and Old World genera of the tribe Copromyzini. Although Norrbom and Kim included several archiborborines as outgroup taxa in their analysis of copromyzine relationships, they did not explicitly analyse whether the two tribes were in fact sister taxa. As well, none of these previous authors have attempted to resolve relationships within the subfamily in a cladistic analysis.
This analysis represents the first phylogenetic hypothesis for the Archiborborinae. The molecular analysis is also the first published for the Sphaeroceridae, although sphaerocerids have been included as outgroups in previous phylogenetic studies on other groups [7][8][9] and were included in the FLYTREE project [10]. Furthermore, with outgroups representing several major clades of Sphaeroceridae, this is the first study to provide quantitative evidence for subfamily-level phylogenetic relationships of the family. Although the resolution of our results is fairly low, we recover several groups consistently and provide limited data on their relationships.

Taxon sampling
All 33 valid, described species of Archiborborinae, as well as 83 undescribed species, were included in the morphological matrix.
Outgroups included multiple representatives of four of the five non-archiborborine sphaerocerid subfamilies. An undescribed species of the genus Pycnopota, which cannot be confidently placed in any known sphaerocerid subfamily, was also included. Nonsphaerocerid outgroups included two representatives of the heleomyzid subfamily Cnemospathidinae (sensu McAlpine [11]). The taxa selected for sequencing represented most of the clades identified in the morphological analysis (Figures 1, 2), as well as outgroups representing the Heleomyzidae, Pycnopota, and three other sphaerocerid subfamilies. Undescribed species are referred to in the text and figures with single names in quotations marks; these names refer to manuscript names and are not considered published under the rules of the ICZN.

Morphological characters
The characters used for the morphological analysis include previously published characters thought to characterize the Archiborborinae or groups within the subfamily, as well a number of newly recognized characters. Some characters of broader significance within the Sphaeroceridae were included in an effort to resolve the position of the Archiborborinae within the family. The complete matrix is presented in Nexus S1.

1)
Interfrontal setae: 0: absent or scattered across frons; 1: present in a distinct row. This character is usually regarded as a synapomorphy for the Sphaeroceridae. The derived state of this character is also found in members of the Milichiidae, and is approached in the Australian genus Borboroides, originally described as a sphaerocerid but now placed in the Heleomyzidae.      lateral corners produced, with tufts of long setae; 2: very broad, longest medially; 3: broad, with medial notch; 4: broad, with paired medial lobes; 5: lateral corners detached 39) Tergite 6: 0: present; 1: absent. This character is only present in Tucma among the Sphaeroceridae, although it is widespread among other Acalyptratae. Marshall [12] suggested this indicated a sister-group relationship between Tucma and the rest of the family.

Molecular characters
Portions of five genes were sequenced: mitochondrial 12S rDNA (portion of 59 end, 358 bp), cytochrome c oxidase subunit I (COI; Folmer or barcode region of 59 end, 655 bp), and cytochrome B (CytB; 39 end, 712 bp), and nuclear alanyl-tRNA synthetase (AATS; portion of 59 end, 401 bp) and 28S rDNA (59 end encompassing expansion segments D1 and D2, 662 bp). The data set thus includes both mitochondrial and nuclear genes, and both ribosomal and protein-coding genes. Genes were selected based on the availability of primers, expected rate of evolution, and prior successful use in phylogenetic analyses of Diptera. A few additional species were sequenced for COI only by the Canadian Center for DNA Barcoding following the methodology of Smith et al. [13] (all COI sequences generated for the project are available with Genbank accession numbers JX260352-JK260397). All sequences are available on Genbank, and sequences and supporting trace files are available from the BOLD workbench [14]. Specimen voucher numbers, accession numbers and BOLD sample numbers are listed in Table S1. Sequences for Rachispoda sp. (from [8]) were downloaded from Genbank.     Table 1. All products were visualized on 1% agarose electrophoresis gels with 2.7% ethidium bromide (10 mg/ mL) using UV transillumination.

Sequencing
Purified products were prepared for sequencing using an ABI BigDyeH Terminator v3.1 Cycle Sequencing kit (PE Applied Biosystems, Foster City, CA, USA). DNA sequencing was performed at the Agriculture & Agri-Food Canada Eastern Cereal and Oilseed Research Centre Core Sequencing Facility (Ottawa, ON, Canada) on an ABI 3130xl Genetic Analyzer (PE Applied Biosystems, Foster City, CA, USA) using the ABI ethanol/EDTA/ sodium acetate protocol. Sequence chromatograms were viewed and contigs assembled in BioEdit [20].

Alignment
Alignment for 12S, COI, CytB, and AATS was performed using the Clustal algorithm implemented in BioEdit. Alignment was straightforward for these genes, with the only indel consisting of an amino acid insertion in AATS for Apteromyia. Alignment of the expansion segments of 28S is difficult with standard alignment algorithms, and so an alternative procedure was followed for this gene. Initial alignment was performed manually based on the published secondary structure for Drosophila melanogaster [21]; the expansion segments were then identified and aligned separately with the program LocARNA [22], which performs multiple alignment based on predicted folding properties of RNA sequences. A single loop region in expansion segment D2, corresponding to positions 454 to 460 in the aligned sequence, was highly variable in length and composition between taxa. It could not be aligned with confidence and was excluded from analysis. The aligned dataset is available in Nexus S2.

Data analysis
Analyses were based on morphological data for 128 taxa (116 Archiborborinae and 12 outgroup taxa), molecular data for 28 exemplar taxa, and combined data for both the 28 exemplar taxa and all 128 taxa (the latter including additional COI sequences from species not included in the molecular-only analysis). All data sets were analysed using both parsimony and Bayesian methods.
Parsimony analyses were conducted in TNT (Willi Hennig Society edition [23]). All characters were treated as unordered and reversible. Bremer indices were calculated for the strict consensus of each analysis. Gaps in the molecular data were treated as missing characters. The search strategy for trees was carried out in two steps; first a New Technology search incorporating sectoral search and tree fusing to find the minimum length 10 times, followed by a traditional (heuristic) search to use TBR swapping on the trees found in the previous step. Character transformation was analyzed in PAUP* 4.b10 [24] with ACCTRAN optimization.
Partitioning strategies for the molecular components of the Bayesian analysis were analysed using Phycas [25]. Phycas implements a stepping stone method for accurate calculation of the marginal likelihoods of different models, allowing comparison of different partitioning strategies [26]. Four different partitioning strategies were compared: no partitioning, partitioning by gene (5   ), partitioning by codon position for all three proteincoding genes (5 partitions), and partitioning by gene and codon position (11 partitions). The initial reference tree was generated in Phycas using partitioning by gene and allowing polytomies, and used for the marginal likelihood calculations (b = 11, with 1000 cycles per b value). Marginal likelihoods were then compared using Bayes factors [27].
Bayesian analyses were conducted using MrBayes versions 3.1.2 and 3.2.1 [28][29][30]. Some MrBayes runs were completed through the CIPRES Science Gateway implemented on the Trestles TeraGrid cluster [31]. All parameters except topology and branch lengths were unlinked between partitions. Analyses consisted of 2 runs with 4-6 chains each, and were run for 10-60 million generations, depending on how rapidly convergence was reached. Temperature was set to 0.08 to improve mixing and convergence. Convergence was assessed using the cumulative, slide and compare options in AWTY [32] to assess split frequency within and between runs, respectively. Trace plots of model parameters were also examined using Tracer [33] to assess mixing and the adequacy of priors.
Rather than determining the nucleotide substitution model for each partition a priori, we used the nst = mixed setting implemented in MrBayes 3.2. This setting allows the MCMC analysis to sample across the GTR model space [34]. For all partitions, among-site rate variation was accommodated using gamma-distributed rates (+C). Although substitution models incorporating a parameter for a proportion of invariable sites (+I) are often used in phylogenetic analyses, gamma-distributed rates can accommodate a wide range of substitution rates, and combining +C and +I parameters in partitioned data sets can result in unreasonable parameter values in the model [26]. The morphological data partition was analysed under the Mk+C model [35]; transition rate asymmetry was accomodated by setting the hyperprior for the symmetric Dirichlet distribution to exponential(1.0). As only informative characters were included in the morphological dataset, the coding option was set to inf.
For analyses with the full taxon set, the consensus trees were poorly resolved. To test whether a few unstable taxa were obscuring the phylogenetic signal, we used the RogueNaRok web service [36]. For parsimony analyses, we optimized the number of bipartitions in the strict consensus trees; for Bayesian analyses, we optimized the support in the majority-rule consensus. In both cases, the default RogueNaRok algorithm was used. Taxa identified as potential wildcards were pruned from trees if their removal improved the resolution or support of the backbone of the tree.

Morphological analysis
Parsimony analysis of the morphological dataset yielded a large number of trees (length 210, CI: 0.319, RI: 0.850). A set of 10,000 trees was retained for further analysis; saving additional trees did not affect the consensus (Figures 1, 2). Archiborborinae were recovered as monophyletic, supported by three unambiguous character changes (mapped in Figures 3, 4) -the cleft epandrium (character 40, shared with Copromyza sp. as well as other Copromyzinae not included in the matrix), ocellar bristles anterior of anterior ocellar triangle (character 2, shared with Tucma tucumana), and the dense patch of setae on the alula (character 18, a unique synapomorphy). Within the Archiborborinae, we have given informal group names (based on distinctive characters or representative species) to several of the clades recovered as monophyletic to simplify further discussion. Of these groups, the spotted wing group, consisting of Archiborborus hirtipes and an undescribed sister species (''altiplanus''), was recovered as sister to all other archiborborines. The clade consisting of Penola and Frutillaria formed a basal polytomy with the undescribed species ''daedalus'' and the remaining archiborborines. The remaining species largely formed an unresolved bush, with some clustered into clades within the bush. Bremer indices were low, with most nodes collapsing in trees 1 step longer. Three additional clades were recovered in the strict consensus after pruning the taxa ''apterus'' and ''biflavus'': one comprising most of the taxa in the bush except the colourful legs and orbitalis groups, one comprising a group of wingless species (apterous group), and one including the emarginatus and mexicanus groups.
Bayesian analysis of the morphological dataset yielded a monophyletic Archiborborinae (pp = 0.87), but with most species groups branching from a single polytomy (Figures 5, 6). Both the spotted wing group and Penola + Frutillaria were recovered in this unresolved bush. The species groups recovered in the parsimony analysis were also recovered as monophyletic, many with high posterior probabilities (eg. quadrilobus, 0.97; orbitalis, 0.99; mexicanus, 1.0). A few taxa were found to increase support for various subclades when pruned, but none improved resolution within the large bush and so we present results with all taxa included.

Molecular analysis
Sequence data were obtained for 25 (COI) to 27 (12S) taxa for each gene fragment, and data for one additional taxon were obtained from Genbank. Between 10.3% (12S) and 35.0% (CytB) of the bases for each gene were parsimony informative, while G+C content ranged from 23.5% (12S) to 46.8% (AATS) ( Table 2). The partitioning strategy test showed that partitioning by gene and codon was better than the other models by a considerable margin (Table 3), and so this strategy was used for all analyses.
Parsimony analyses produced a well-resolved consensus (Figure 7; 3 trees, length 3569, CI: 0.368, RI: 0.345). The analysis placed the representatives of the colourful legs, yellow forelegs, mexicanus, and emarginatus groups as sister to the remaining archiborborines, with the mexicanus and emarginatus representatives sister to each other. Bremer indices for the backbone were low. The Bayesian analysis was also well resolved, with moderate to strong support along the backbone (Figure 8). The same four exemplar species were recovered as sister to the remaining archiborborines. Positions of the remaining ingroup taxa were also similar, although with some rearrangements.

Combined analysis
Parsimony analysis of the 28 taxon combined data set yielded two parsimonious trees (length 3734, CI: 0.372, RI: 0.356). The strict consensus ( Figure 9) had a very different topology from those recovered in the molecular analysis, with the representatives of the apterous and orbitalis groups recovered as sister to the remaining archiborborines. The taxa that had been recovered as the basal lineages in the molecular analysis were found to form a weakly supported clade, along with ''bellavista''. The Bayesian analysis, however, produced a topology very similar to the molecular analysis ( Figure 10). Posterior probabilities were similar, with only slightly higher or lower support for most clades.
The parsimony analysis of the 128 taxon data set produced a very large number of equally parsimonious trees (length 4041, CI: 0.349, RI: 0.453). As with the morphological analysis, 10,000 trees were retained for further analysis. The strict consensus tree (Figures 11, 12) was fairly well resolved. The apterous and orbitalis groups were placed as sister to each other, and together as sister to the remaining archiborborines. Many of the species placed in the large unresolved bush in the morphological analysis were recovered in a single clade, also including the quadrilobus group. However, Bremer support indices were low. The Bayesian analysis (Figures 13, 14) was not as well resolved; as with the morphological dataset, most species and groups were recovered in a single bush. Unlike the parsimony analysis, a clade including the spotted wing group, Penola and Frutillaria, and ''daedalus'' was recovered as sister to the remaining Archiborborinae; the clade exclusive of those taxa was supported with moderate posterior probabilities (pp = 0.84). Pruning two taxa, ''vittifrons'' and ''echinus'', improved support somewhat and added some resolution to the backbone. Specifically, two clades comprising the yellow forelegs and colourful legs groups and the emarginatus and mexicanus groups respectively were recovered as subtending a clade containing the remaining taxa and groups. However, support for this arrangement was still very low, with posterior probabilities just over 0.5.

Monophyly and position of the Archiborborinae
The Archiborborinae were supported as monophyletic in all analyses. There are several apparent morphological synapomorphies for the subfamily (Figures 3. 4). All fully winged species have a dense patch of flattened setae on the calypter (character 18); unlike other members of the family. This is the only unreversed unique synapomorphy for the subfamily, although it could not be coded for apterous species. When present, the ocellar bristles are inserted at or anterior to the level of the median ocellus (character 2); this character is shared only with Tucma in the Sphaeroceridae. The postocellar bristles are also greatly enlarged in all species where they are present (character 4); this character also occurs in Pycnopota and some Limosininae, but not in the Copromyzinae. The epandrial cleft (character 40) was also recovered as a synapomorphy in the morphological analysis, shared only with Copromyza; however, this character occurs more widely in unsampled Copromyzinae and a possibly homologous state occurs in the Sphaerocerinae.
The positions of the Archiborborinae and the Copromyzinae in all analyses contradicted their previous treatment as a single subfamily. The Copromyzinae and Sphaerocerinae consistently formed a monophyletic clade to the exclusion of the Archiborborinae. Synapomorphies for the Sphaerocerinae-Copromyzinae clade include the loss of the katepisternal bristle (character 12) and probably the presence of only two spermathecae (character 53), although the latter is also widespread elsewhere in the Sphaeroceridae and is also found in the spotted wing group of Archiborborinae. In the molecular and combined analyses, the exemplars of Lotophila and Parasphaerocera formed a well-supported clade within this group, sister to the exemplar of Copromyza. While the sphaerocerines clearly form a monophyletic group within the Sphaerocerinae-Copromyzinae clade, there are no clear morphological synapomorphies for the Copromyzinae s.s. and it is probable that the subfamily is not monophyletic. Further phylogenetic analysis with more extensive taxon sampling in this clade is needed.
The position of the enigmatic taxon Pycnopota varied between analyses. None of the positions were well supported. Very little is known of this genus, which includes a single described species from Bolivia and undescribed species from Brazil (specimens in United States National Museum of Natural History, Smithsonian Institution, Washington and Museu de Zoologia, Universidade de Sao Paulo) and Costa Rica (specimens in University of Guelph Insect Collection). Roháček et al. [2] considered the genus a possible member of the Heleomyzidae, but other than the spinose costa its morphology is that of a sphaerocerid, and the phylogenetic results corroborate this. Final resolution of its position will require broader taxon sampling of non-archiborborine Sphaeroceridae and Heleomyzidae, but it is clearly distinctive and may represent a new monotypic subfamily.
The relationship between the Archiborborinae and the remaining Sphaeroceridae was moderately well resolved from the data. In most analyses it was placed in a clade with Pycnopota and the Sphaerocerinae+Copromyzinae clade, although this clade was only well supported in a few analyses. This grouping (excluding Pycnopota) was suggested by Marshall [37] as monophyletic based on the presence of the epandrial cleft. This character is present in all Archiborborinae and many Copromyzinae (apparently reversed in a clade of four genera, including Lotophila [38]), while the Sphaerocerinae have a possibly homologous state with the epandrium completely divided above the anus. However, final resolution of the position of the Archiborborinae and relationships of the subfamilies may require more extensive taxon sampling, particularly including molecular data for Tucma and problematic genera such as Palaeoceroptera Duda.

Phylogenetic relationships of the Archiborborinae
Analyses produced mixed results within the Archiborborinae. The morphological and combined analyses supported fairly consistent species groups within the subfamily. However, the relationships between the species groups were inconsistent and generally poorly supported, and a number of species were not placed in species groups in some analyses. Basal nodes in particular were poorly resolved in all analyses, with short branch lengths; this may indicate that the subfamily underwent rapid divergence early in its evolutionary history.
The existing generic classification of the Archiborborinae [2] places all winged species in Archiborborus. Despite the conflicting results of different analyses, this classification is clearly untenable. Winged and brachypterous or apterous species occur throughout the subfamily. The all-taxa combined parsimony analysis suggests wing loss occurred seven times, while the all-taxa combined Bayesian analysis suggests at least five (and possibly seven) occurrences. Wing loss is quite common among the Sphaeroceridae; although no complete phylogeny for the family is available, flightless species are currently known in 30 genera in the family (unpublished data), and in at least some of those genera there have probably been multiple instances of wing loss (eg. Aptilotus Mik [39][40]). Clearly this character is not reliable as a primary basis for a cladistic classification. All analyses show that the type species of Antrops, Penola, and Frutillaria are within clades including winged species, rendering such a broad concept of Archiborborus paraphyletic. Frutillaria is monophyletic and sister to Penola, and so the existing treatment of these genera [1] can be maintained. Archiborborus hirtipes and its undescribed sister species, as well as the undescribed ''daedalus'' are found either in a clade with Frutillaria and Penola (Bayesian analysis of morphology, all molecular and combined analyses) or as a basal grade alongside those two genera (parsimony analyses of morphology). Two new genera will to be erected to contain them (Kits and Marshall, submitted).
Conflicting data pose some difficulty in further revising the classification of the remaining species in the subfamily. The combined parsimony analysis provides the most resolved treatment including all taxa in the subfamily, but support for most clades is very low. The type species of Antrops and Archiborborus are consistently placed in the same clade, and thus Archiborborus is best treated as a junior synonym of Antrops. Further splits of the informal groups identified here may be appropriate, but need to be justified in the context of a complete species-level revision of the subfamily (Kits and Marshall, submitted). Further data, particularly more molecular data for under-sampled clades, will undoubtedly be useful in further refining the classification of the subfamily and ensuring that generic concepts are able to stabilize.

Supporting Information
Table S1 Specimen data for molecular exemplars. Voucher numbers for newly sequenced specimens indicate the unique identification number in the University of Guelph Insect Collection specimen database which holds full collection details; this number is also printed on individual specimen labels. Sequence data is stored in both Genbank and BOLD; the latter also includes trace files for sequences. (DOCX) Nexus S1 Matrix of morphological characters.