Colour Patterns Do Not Diagnose Species: Quantitative Evaluation of a DNA Barcoded Cryptic Bumblebee Complex

Cryptic diversity within bumblebees (Bombus) has the potential to undermine crucial conservation efforts designed to reverse the observed decline in many bumblebee species worldwide. Central to such efforts is the ability to correctly recognise and diagnose species. The B. lucorum complex (Bombus lucorum, B. cryptarum and B. magnus) comprises one of the most abundant and important group of wild plant and crop pollinators in northern Europe. Although the workers of these species are notoriously difficult to diagnose morphologically, it has been claimed that queens are readily diagnosable from morphological characters. Here we assess the value of colour-pattern characters in species identification of DNA-barcoded queens from the B. lucorum complex. Three distinct molecular operational taxonomic units were identified each representing one species. However, no uniquely diagnostic colour-pattern character state was found for any of these three molecular units and most colour-pattern characters showed continuous variation among the units. All characters previously deemed to be unique and diagnostic for one species were displayed by specimens molecularly identified as a different species. These results presented here raise questions on the reliability of species determinations in previous studies and highlights the benefits of implementing DNA barcoding prior to ecological, taxonomic and conservation studies of these important key pollinators.


Introduction
Pollination is a key ecosystem service for both natural and agricultural ecosystems. While evidence for a pollination crisis is equivocal [1,2,3], it is increasingly clear that wild pollinators, and particularly bees, are in decline worldwide [4,5,6,7]. Consequently, a key aim in insect conservation is to understand and reverse these declines. In northern temperate regions, bumblebees -as ecologically and commercially important pollinators -have been the focus of considerable conservation effort [5,8,9,10,11,12]. Conservation of any group requires an assessment of both current status and the impact of potential interventions, and this in turn requires accurate species identification.
While recent phylogenetic analyses have advanced our understanding of bumblebee taxonomy [13], Bombus species are notorious for possessing convergent colour patterns, resulting in cryptic species [14,15]. Morphologically cryptic species pose significant problems for field assessment of species richness and diversity, both of which are central to understanding current and future population trends of these important pollinators. Unfortu-nately, in Western Europe one of the apparently most common bumblebees, B. lucorum (Linnaeus), belongs to such a cryptic species complex. While B. lucorum s.l. is common throughout Europe [16,17,18], and has been commercially reared as a pollinator for greenhouse crops [19], increasing chemical [20] and molecular [20,21,22,23] evidence supports the idea that it actually represents three distinct species, B. cryptarum (Fabricius), B. lucorum and B. magnus (Vogt). The use of molecular tools in large-scale ecological studies has revealed differences in the ecology, abundance and distribution of these putative species [22,23], indicating the need to reassess the conservation status and response of these species to interventions separately.
Presently, the majority of species records are collected by members of the public using morphological and especially colour-pattern characters for identification (e.g., in the UK by the Bees, Wasps and Ants Recording Scheme, www.bwars.com and the Bumblebee Conservation Trust http://www.bumblebeeconservation.org.uk/). Therefore, there is a clear need to assess the value of colour-pattern characters in distinguishing these species. Both the scientific and popular literature suggest that such characters can be used to identify colony-founding queens in the spring [20,24,25,26,27,28,29]. Morphology based diagnosis of B. lucorum s.l. queens principally involves characteristics of the collar, labrum, ocelli and metasomal terga [20,24,30,31]. Of these, the most commonly used characters for species diagnosis are those describing the pile ('hair') of the collar (a band of yellow hairs at the anterior of the thorax) [24,25,26,27], particularly: (i) the extent of a collar extension below the tegula [24]; (ii) the presence or absence of melanisation in the collar [20]; and (iii) the presence or absence of an 'S-shape' of black hairs on the side of the collar [20]. While the extension of such traits to identifying workers has been shown to fail [23], and despite reservations that character variation may not be discrete [28], to our knowledge there has been no quantitative assessment of their accuracy in identifying queens. The main aim of this study is to use molecular tools to assess the value of these popular colour-pattern characters in biodiversity and conservation surveys of the B. lucorum species complex.
The DNA barcode, an approximately 650 bp region of the cytochrome c oxidase (COI) mitochondrial gene, has become an invaluable tool for species identification and taxonomic hypothesis generation for both ecologists and taxonomists alike [32,33,34,35]. To date over 700,000 barcode records have been generated for insects (http://www.boldsystems.org) and barcoding has been utilised across major insect orders including Lepidoptera [36,37,38,39], Coleoptera [40] Diptera [34,41,42,43], Hemiptera [44,45,46] and Hymenoptera [47,48,49]. In many cases the morphological characters traditionally used to describe a species or indicate relationships among species have been re-evaluated in the light of groupings arising from DNA barcoding studies. Evaluating species diagnostic morphological characters of genetically discrete units in such a way can indicate strengths and weaknesses in current taxonomic diagnoses of the group under study and often taxonomic revision may follow [35,44,49,50].
However, for the lucorum complex the diagnostic value of these characters has yet to be assessed quantitatively against independently determined specimens. Here we combine DNA barcoding and quantitative colour-pattern analyses to address this question at the scale of both Ireland and continental Europe.

Taxon sampling and morphological analysis
Irish specimens used in this study were obtained from a collection originally sampled in 2005 and 2006 [22]. In order to extend the study to a European scale, and investigate geographic variance, specimens were obtained from the Orkney Islands of Northern Scotland, Denmark and Finland. Field studies did not involve endangered or protected species and no specific permits were required for the collection of specimens included in this study. Permission was not required for the collection of Danish, Scottish, Finnish and the majority of Irish specimens. Collecting unlisted insect species samples from lands of the Irish National Parks and Botanic Gardens required only verbal permission, and this was obtained from the National Park Rangers for each park, and from Dr Peter Wyse Jackson, Director of the National Botanic Gardens. Queens of B. lucorum agg. were collected in all regions, except Denmark, without prior morphological identification. Danish queens were initially morphologically identified to characterise representative specimens of each putative species prior to our genetic analyses. As such, specimens that displayed characters at the extremes of the morphological ranges were most likely to occur in the Danish samples. In total 67 queens were included (Table S1) in the molecular and morphometric analyses. The Irish and Danish groups (totaling 48 queens) were also analysed separately, as they consisted of all three species (see below). Using electronic Vernier calipers (Mitutoyo Absolute Digimatic), we measured thorax width between the tegulae (body size), breadth of the collar on top of the thorax, vertical and horizontal extensions of the collar below the tegula, and breadth of the collar below the tegula. We also recorded a series of qualitative characters that have been deemed to be diagnostic for the three species: (i) the presence or absence of a black 'S-shape' on the side of the collar; (ii) the presence or absence of melanisation in the collar; and (iii) the presence of a collar extension below the tegula. To account for variation in quantitative morphological characters that may be affected by animal size, relative sizes were calculated using thorax breadth as a measure of body size. Morphological data were analysed using either one-way ANOVAs with Tukey's post-hoc test for pairwise differences, or G-tests. All analyses were conducted using SPSS v. 16.0. Data are shown as boxplots as we are interested in the biological variation in morphological traits.

DNA extraction and amplification
Total DNA was extracted from the mid or hind legs using the DNeasy Blood & Tissue extraction kit (Qiagen), with a modified protocol. Legs were ground in a 1.5 ml Eppedorf tube comprising 200 ml PBS with a disposable polypropylene pestle connected to a hand held motor (Sigma Aldrich). Then 180 ml of Buffer AL and 20 ml Proteinase K (Qiagen kit components) were added and the sample was homogenized further prior to incubation at 70uC for 30 mins. Thereafter, 200 ml of 100% ethanol was added and the remaining extraction followed the manufacturer's protocol.
For amplification and sequencing of the COI region the primers LCOHym (59-TATCAACCAATCATAAAGATATTGG-39; [51]) and NancyShort were used (59 CCCGGTAAAATTAAAA-TATAAAC-39; [52]. PCRs were carried out in 20 ml volumes using 16 PCR buffer (Invitrogen), 2.5 mM MgCl 2 , 0.2 mM of each primer, 0.2 mM of each dNTP, 1 U Taq polymerase (Invitrogen) and approximately 50 ng template DNA. PCR reaction conditions included denaturation at 94uC for 240 s followed by 32 cycles of 60 s at 94uC, 45 s at 48uC, 60 s at 72uC and a final extension at 72uC for 420 s in a Peltier thermal cycler (PTC 200; MJ Research). PCR products were sequenced in the School of Natural Sciences, Trinity College using an ABI 3130xl capillary automated sequencers (Applied Biosystems Inc.). The same PCR primers were used for sequencing. Forward and reverse sequence reads were view and assembled into contiguous sequences using BioEdit v7.0.9 [53]. All polymorphic sites were visually inspected to ensure correct base calling and sequences were aligned manually. The electropherogram, sequence and specimen data were submitted to BOLD (Accessions JCLUC001-11 to JCLUC-067-11 http://www.barcodinglife.org) and are available in the ''Irish Bombus lucorum'' project folder of BOLD. Sequences were also deposited in GenBank (Accessions JN872566 to JN872632).

Sequence Analysis
The final aligned matrix was imported into MEGA version 4.0 [54] for phylogenetic analysis. No gaps were present and translations of the sequences indicated the absence of stop and nonsense codons. After sequence trimming a total of 642 base pairs were available for analysis. Nucleotide positions were determined using the full COI gene sequence from the mitochondrial genome of B. hypocrita (GenBank accession no. NC011923; positions 1996 to 3555) and the mouse COI reference sequence (derived from GenBank accession no. AK166798). Sequence divergences were calculated using the Kimura 2-parameter (K2P) distance model [55] using MEGA 4. Bayesian analysis was set and implemented using Bayesian  [13] and Murray et al., [22] which indicate its sister group status with respect to the taxa included in this study. The analysis was run for 10,000,000 generations with sampling of trees every 1000 generations. After termination the Markov chain Monte Carlo (MCMC) output was analysed using TreeAnnotator v.1.5.4 (http://beast.bio.ed.ac.uk) to produce a consensus tree from the post burn-in tree sample (burn-in 1000) with a posterior probability of 0.5, targeting the maximum clade credibility tree and keeping the target node heights. All trees were opened and viewed in FigTree v 1.3 (http://tree.bio.ed.ac.uk/ software/figtree/). Haplotype variation, diagnostic sequence distribution and inter-and intra-specific divergences were determined using the HapMap application of the iBarcode website (http://www.iBarcode.org/).

Results
The COI barcode was sequenced for 67 representatives of the lucorum complex with the majority of the barcode sequence being obtained for each. The reference DNA barcode begins 58 bp from the 59-end of the mouse COI gene, which corresponds to position 52 of the B. hypocrita COI gene. In total, 642 nucleotides were available for analysis. Bayesian analyses resulted in a tree ( Figure 1) with three distinct and well supported clades, each of which we assume represents one of the three species. Of the 37 Irish specimens, 22 were B. magnus, 8 were B. cryptarum, and 7 were B. lucorum. The Danish B. magnus resolved as a sister group to Irish B. magnus (Clade 1; 1.00 Bayesian Posterior Probability, (BPP)), differentiated by two polymorphisms that represent synonymous substitutions. Danish, Irish and Orkney Island B. cryptarum resolved as a monophyletic group with Finnish representatives, resolving as a sister group to the main group (Clade 2; 0.99 BPP). Finnish specimens possessed 8 polymorphisms with respect to their Irish/ Orkney/Danish counterparts. Only a single polymorphism (C/T at position 391) was evident within the B. lucorum clade (Clade 3; 1.00 BPP) and this polymorphism was shared by two Finnish specimens (T904 and T909).
In total, there were 50 variable characters of which 41 were deemed to be informative. Haplotype mapping using iBarcode web resources indicated that 16 haplotypes were present across all species. However, a number of these are derived from unique nucleotide polymorphisms found in a single specimen. Such haplotypes were discounted from further consideration and specimens with a single polymorphism were included in the counts for the most similar haplotype. Two distinct haplotypes are evident for B. magnus and B. cryptarum, where the haplotypes correspond to geographically distinct groups. Of the seven nucleotide differences between the two B. cryptarum haplotypes, four are shared between the Finnish B. cryptarum and both B. magnus and B. lucorum. Although a number of diagnostic nucleotide characters were evident for each species, when each was compared to the other two species only two nucleotide positions can be deemed diagnostic for species delineation across the entire

Morphological analysis
Quantitative measurements and qualitative scores were obtained for a range of morphological characters that have been suggested to be diagnostic for species or of taxonomic importance for B. lucorum s.l..

Body size
While there was considerable overlap among species in body size (Fig. 2a), there were nevertheless significant differences among the species in mean thorax breadth between the tegulae (Tables 2,  3). Post-hoc tests showed that B. magnus was bigger than B. lucorum ( Table 2, P = 0.035) and B. cryptarum ( Table 2, P = 0.043), but there was no difference between the latter two species (P = 0.943; Fig. 2a, Table 2). Analysing the Irish specimens on their own, there was still a significant difference in body size (Tables 2, 3), with post-hoc tests showing that B. magnus was bigger than B. lucorum ( Table 2, P,0.001) but not B. cryptarum (Table 2, P = 0.438), and that B. cryptarum was bigger than B. lucorum ( Table 2, P = 0.014; Fig. 2b). Finally, analysing the Danish specimens on their own, there was no significant difference in body size across all species or in any of the pairwise comparisons (Tables 2, 3, all P.0.2; Figure 2c).

Absolute collar breadth on top of thorax
There were no significant differences in absolute collar breadth on top of the thorax among species (Tables 2, 3). This was true for the entire data set, and for independent analyses of the Irish and Danish specimens (Tables 2, 3). Results were the same when collar breadth was re-scaled relative to body size (Tables 2, 3).

Length of collar extension below tegula
As with body size, despite considerable overlap among species, there were significant differences among species for the length of the collar extension below the tegula on the left (F 2,63 = 76.059, P,0.001, explaining 70% of the variance among specimens) and right (F 2,63 = 71.046, P,0.001, explaining 69.3% of the variance) sides of the thorax (Table 2), as well as the average extension across both sides (F 2,63 = 93.378, P,0.001, explaining 74.8% of the variance). All pairwise differences were significant (all P,0.001), with the collar of B. magnus extending significantly further below the tegula than that of B. cryptarum, which itself had a longer absolute extension of the collar than B. lucorum (Table 2). Results were qualitatively similar for the Irish and Danish data sets when analysed on their own (statistics not shown, Table 2), although in the Danish data, only B. lucorum had a consistently significantly shorter collar extension than B. magnus. Results were qualitatively the same when length of the collar extension was rescaled relative to body size (Tables 2, 3).

Breadth of collar extension below tegula
Again, despite large overlap among species in this character, there were significant differences in the breadth of the collar extension below the tegula on the left (F 2,63 = 29.667, P,0.001, explaining 48.5% of the variance among specimens) and right (F 2,63 = 30.492, P,0.001, explaining 49.2% of the variance) sides of the thorax (Table 2), as well as the average breadth across both sides (F 2,63 = 37.293, P,0.001, explaining 54.2% of the variance). All pairwise differences were significant (all P,0.04) with the collar of B. magnus being broader below the tegula than that of B. cryptarum, which itself was broader than B. lucorum (Table 2). Results were qualitatively similar for the Irish and Danish data sets when analysed on their own (statistics not shown, Table 2), although in the Danish data only B. lucorum had a consistently significantly narrower collar extension than B. magnus. Results were qualitatively similar when breadth of the collar extension was rescaled relative to body size (Tables 2, 3).  One of the key characteristics that has been suggested as a way to identify the three species is the combination of relative breadth and length of the collar extension below the tegula. However, a plot of these two morphological variables against each other (Fig. 3) shows that they cannot unambiguously differentiate between the three species at either continental or local scales. At the European scale, all three species overlap in this combination of characters, with particularly strong overlap between B. cryptarum+B. lucorum and B. cryptarum+B. magnus respectively (Figure 3a). In the Danish samples, which were identified morphologically prior to molecular analyses, a similar degree of overlap is observed (Figure 3c). In contrast, the Irish samples suggest possible differentiation between B. cryptarum and B. lucorum using this combined character, although B. cryptarum and B. magnus still overlap (Figure 3b). Figure 4 shows examples of this overlap, with molecularly identified specimens from each species falling into all 3 of the colour-pattern groups.

The ''S-shape''
Queens of B. cryptarum were significantly more likely to have an ''S-shape'' than either B. lucorum or B. magnus (left-hand side of the thorax: G = 21.544, P,0.001; right-hand side of the thorax: G = 21.162, P,0.001). However, only 64% and 57% of B. cryptarum queens had the ''S-shape'' on the left and right sides, respectively (Table 4). While Irish queens were more likely to possess the S-shape if they were B. cryptarum (left: G = 7.488, P = 0.024; right: G = 6.678, P = 0.035), only 57% of B. cryptarum queens exhibited this marker while up to 18% of B. magnus possessed an S-shape, depending on thorax side (Table 4; see Figure 4, specimen g for an example of a B. magnus queen with an ''S-shape''). Unsurprisingly, given that the animals were morphologically identified on the basis of the S-shape prior to sequencing, 100% of Danish B. cryptarum queens had the S-shape on the left and right sides of their thorax (G = 18.249, P,0.001 for both sides), while specimens of the other two species had no S-shape (Table 4).

Extension directly beneath the tegula
There was a significant difference among species in the possession of a collar that extended along the thorax below the tegula (left: G = 15.868, P,0.001; right: G = 25.490, P,0.001), a feature which is claimed to be diagnostic for B. magnus. While both B. magnus (48%) and B. cryptarum (22%) possessed this character state on the left side of the thorax, only B. magnus possessed it on the right side (44%) ( Table 4). Only B. magnus queens from the Irish data set had a collar extension below the tegula. 45% of these queens had an extension on the left side, while 50% had it on the right side (G = 10.878, P = 0.004; G = 12.308, P = 0.002, respectively) ( Table 4). There was significant variation among species in the Danish sample for the presence of a collar extension below the tegula on the left-hand side of the thorax (G = 12.391, P = 0.002). While 3 out of 5 B. magnus queens had this trait, all of the B. cryptarum queens possessed it (5/5). Only one of the Danish queens had a collar extension below the tegula on the right side, and this was a B. magnus (Table 4).

Melanisation within the collar
There was a significant difference among species in the presence of melanised hairs scattered within the otherwise yellow collar (G = 9.890, P = 0.007; Table 4), another trait suggested to be diagnostic for B. cryptarum. Overall, 70% of B. cryptarum queens possessed this state, as opposed to 25% of B. lucorum queens and 33% of B. magnus queens. However, in the Irish dataset there was no relationship between melanisation and species (G = 1.738, P = 0.419; Table 4). Again, unsurprisingly, melanisation was present in all of the morphologically identified Danish B. cryptarum queens, only 1 of the B. lucorum queens and none of the B. magnus queens (G = 14.623, P = 0.001; Table 4).

Discussion
Species determination is central to most assessments of biodiversity. Here we use DNA barcoding to show that currently recommended morphological traits for identifying a cryptic species complex of bumblebees are unreliable at both local and European scales. Below, we discuss these results in detail before assessing the impact they have on bumblebee conservation in western Europe.
COI barcode analysis indicated that three distinct molecular operational taxonomic units (MOTUs) exist within the European B. lucorum complex, a finding congruent with previous studies based on molecular genetics [20,21,22,23] and secretions of the male labial glands [20]. Clade structure and genetic distances obtained in this study were similar to those obtained in previous studies [21] with intraspecific and interspecific genetic distance values of 0.001-0.004 and 0.033 to 0.044 respectively. Intraspecific and interspecific values obtained by Murray (2008) and Bertsch (2009) were 0.002-0.004 and 0.046-0.067 respectively. These distance values clearly indicate a significantly greater level of divergence between the three taxa and although the use of DNA sequences for species discovery or recognition is a contentious issue [57,58,59] our results are consistent with the view that the Northern European B. lucorum complex is comprised of three species.
The highest amount of intraspecific COI sequence variability was observed for B. cryptarum, a result consistent with previous studies [22]. There is a clear distinction between Finnish B. cryptarum and those sampled from Denmark, Scotland and Ireland. This differentiation, which corresponds to two distinct haplotypes, is supported by nine polymorphisms. This level of variation warrants further attention to evaluate the systematic significance of such divergence. A similar distinction was observed between Danish and Irish B. magnus with both forming monophyletic groups sister to each other. Danish B. magnus possessed two diagnostic polymorphisms with respect to their Irish counterparts (Table 1). B. lucorum demonstrated the lowest levels of intraspecific variation with only a single polymorphism observed across the Danish, Finnish, Irish and Orkney Isles specimens. These results  may imply species differences in natural migration rates, and thus population structure across Europe, but this needs to be investigated using more appropriate markers (e.g., microsatellites). Such differences have obvious conservation implications given the level of habitat fragmentation in the European landscape. The discrete differences between the three species obtained from the DNA barcode analysis were not replicated when morphological characters were analysed. Based on the characters most commonly cited as distinguishing these taxa, results from the morphological analyses failed to reveal a character that could reliably differentiate queens of the three species. Neither of the two most definitive character states -the presence of an S-shape on the collar, or extension of the collar under the wings -was consistently found in the species with which it has been associated as claimed previously [20]. While there were clearly significant differences among species in the presence of such traits, or in the mean values of more quantitative traits, there was significant overlap among species. While previous studies and bumblebee identification guides have claimed that specific morphological features can be used to identify queens of these three cryptic species, few [23,28] have quantitatively examined such claims, and these studies had no independent diagnostic of species identification. Our results agree with Williams' [28] suggestion that these supposedly ''diagnostic'' characters overlap considerably and vary along a morphological continuum. Consequently, they are of limited use for morphological diagnosis of queens of B. cryptarum, B. lucorum and B. magnus.
The majority of characters used for bumblebee species differentiation in ecological studies are based on patterns relating to hair, which for bumblebees can be highly variable within species. Moreover, hair colour patterns often demonstrate a high degree of convergence and even Muellerian mimicry between species [15]. This can be clearly seen in our results (Fig. 4), where hair patterns are clearly variable and form a continuum across species.
In contrast to our study, previous molecular studies [20,21] report specimens identified to species using COI sequencing that consistently display ''species diagnostic'' morphological characters. However these studies (and others) may have suffered from insufficient or selective sampling (it may be common for collectors to select 'typical' specimens for study, either intentionally or otherwise). The number of specimens included in the Bertsch et al's., study [20] was very low (n = 2 sequences per species). In total, 28 lucorum complex queens were included in the molecular analysis of Bertsch [21] and although 18 distinct geographic regions were represented in that study, the low sample size (max n = 4, with the majority of locations represented by n = 1) could not result in a conclusive evaluation of the morphological variance that exists within a given locality. The presence of clear geographical variation in colour patterns across species, as found in our study, suggests that additional investigations across the range of these species involving a randomly collected and large numbers of queens would be highly valuable.
The ability to conduct meaningful ecological and population genetic studies of particular taxa is reliant on our ability to correctly identify species. The results of our study clearly indicate that the potential for misidentification within the B. lucorum complex is high, particularly when dealing with randomly sampled large collections. Consequently, studies involving B. lucorum s.l. conducted prior to the utilisation of molecular methods for species identification may be subject to erroneous interpretation. The work of Waters et al [23] highlights the problems associated with field-based identifications of workers from the B. lucorum complex. The authors recommend the accompaniment of ecological studies with molecular methods for species identification. While this provides a solution, it is unlikely to be one that can be used for large-scale abundance and diversity studies. We hope that by demonstrating the problems with current characteristics used for queen species identification in the B. lucorum species complex, we will motivate the discovery of accurate morphological markers. Finally, we have shown that DNA barcoding for species identification can provide the basis for such quantitative studies.

Supporting Information
Table S1 Accession information and geographic location of B. lucorum complex specimens utilised in the molecular and morphological analyses. Additional information for each specimen has been linked to the BOLD ID and deposited in the BOLD database. Abbreviations used: NBG: National Botanic Gardens