Isolation of Hox Cluster Genes from Insects Reveals an Accelerated Sequence Evolution Rate

Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda) that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera). We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx) from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.

Despite the importance of insects as the largest animal group on earth, and Hox genes as the most influential gene class in EvoDevo research, Hox genes have been isolated from only 8 out of some 35 insect orders yet. The full repertoire of Antennapedia genes has so far only been reported for Folsomia candida, Tribolium castaneum and Drosophila melanogaster. The majority of all sequences derive from two orders only, the Hymenoptera and the Diptera. In Drosophila melanogaster the Hox-Cluster is organized in two separate units: (a) the Antennapedia complex consisting of the Hox genes labial (lab), proboscipedia (pb), Hox3 (z2, zen, bcd), fushi tarazu (ftz), Deformed (Dfd), Sex combs reduced (Scr) and Antennapedia (Antp), and (b) the Bithorax complex which includes Ultrabithorax (Ubx), abdominal-A (abd-A) and Abdominal-B (abd-B) [17,18,19]. This split is likely an autapomorphy of the Diptera since all of the above mentioned genes may be linked in a single cluster in other insects, e.g. Coleoptera [8,[20][21][22][23][24].
It is highly unfortunate that very little is known about Antp genes in basal insects and that the origin and radiation of Hox genes in insects remains widely unresolved. Marden et al. [25] highlight the crucial importance of isolating Hox genes particularly from basal Pterygota in order to reveal intermediate stages of evolution of appendages and shed some light on the early evolution of flying insects. We here report on the successful isolation of 37 new homeobox fragments from six insect orders of crucial phylogenetic position, the apterygote Diplura and Archaeognatha, and the pterygote orders Ephemeroptera, Odonata, Plecoptera, and Dermaptera. We furthermore show that the rate of homeobox sequence evolution in the fastest radiating animal group, the insects, has been faster than in non-insects.

Animal Material and DNA Extraction
Specimens of Campodea fragilis (Diplura) and Lepismachilis y-signata (Archaeognatha) were kindly supplied by Karen Meusemann (ZFMK Bonn, Germany). Sympetrum sanguineum, Ischnura elegans (both Odonata) and Baetis sp. (Ephemeroptera) were collected at a small pond close to our institute in Hannover. The Nemoura cinerea (Plecoptera) sample was kindly supplied by the National Museum Prague (Czechia) and Forficula auricularia (Dermaptera) was found in Hannover in a private garden. Tissue samples (legs of S. sanguineum or else whole animals) were preserved in ethanol (80%) and stored at 4uC. Whole genomic DNA was extracted according to Hadrys et al. [26,27]. (No specific permits were required for the described field studies. The locations are not privately-owned or protected in any way and the field studies did not involve endangered or protected species.).

PCR Amplification
Partial homeobox sequences of the genes Deformed (Dfd), Sex combs reduced (Scr), Ultrabithorax (Ubx) and abdominal-A (abd-A) were amplified by PCR with degenerate primers. We designed ''insect specific'' degenerate primers, which specifically amplify partial homeobox sequences of between 120 and 164 bp of the target genes (Table 1). In addition, homeobox sequences were amplified by various combinations of four degenerated forward primers and five degenerated reverse primers reported in Cook et al. [28].
''Insect specific'' degenerate Primer PCR. Reactions were carried out in a total volume of 30 ml containing 40 pmol of each primer pair, 3.3 mmol of dNTP mix, and 1.5 U of Taq-Polymerase (Invitrogen). PCR started with an initial denaturation (93uC for 2 min) followed by 45 amplification cycles: denaturing at 92uC for 30 sec, annealing at 55 to 75uC (optimized for each primer pair and organism) for 35 sec, elongation at 72uC for 30 sec. All PCRs finished with a final elongation at 72uC for 5 min. PCR products were purified with Montage PCR Centrifugal Filter Devices (Millipore).
Degenerate Primer PCR [28]. The 50 ml reaction mix contained: 16 amplification buffer, 4 mM MgCl 2 , 0.2 mM dNTPs, 10 pM each primer and 0.04 U Taq DNA polymerase (Bioline). The ramp up PCR started with an initial denaturation (95uC for 5 min) followed by 6 amplification cycles: denaturing at 94uC for 45 sec, annealing started at 48uC for 10 sec followed by a ramp to 56uC (0.1uC/sec) and a ramp to 72uC (0.2uC/sec), elongation at 72uC for 10 sec, and subsequent 30 amplification cycles: denaturing at 94uC for 30 sec, annealing started at 53uC for 10 sec followed by a ramp to 62uC (0.1uC/sec), elongation at 72uC for 30 sec and finished with a final elongation at 72uC for 5 min. PCR products of the expected length (,70 -,100 bp) were cut out of the gel and purified through ethanol precipitation.

Cloning and Sequencing
The purified products were A-tailed and inserted into the pGEM-T plasmid vector (Promega) and cloned into E. coli (Invitrogen) following the manufacturer's instructions. Clones were sequenced in both directions on an ABI PRISM 310 Genetic Analyzer (Applied Biosystems) using BigDyeH Terminator Cycle Sequencing Kit (v.1.1, Applied Biosystems). Sequences were analyzed and aligned using SeqMan II 5.03 (DNAStar, Lasergene) and ClustalW [29].

Calculating Divergence Rates
To infer rates of molecular evolution of insect Hox genes pdistances within groups were calculated using MEGA5 [30]. These divergence rates of insect Hox genes were compared to calculated divergence rates of other arthropod classes and Mammalia (see Table S1 for their GenBank accession numbers). Information from fossil records was used to estimate the absolute rates (in % per million years) at which the different lineages have accumulated mutations in their homeobox sequences.

Results and Discussion
In this study we have isolated the first homeobox sequences of Hox cluster genes from six insect orders: Diplura (lab, Dfd, Scr, Antp, ftz, abd-A, Abd-B), Archaeognatha (Dfd, Scr, Antp, Ubx, abd-A, Abd-B), Ephemeroptera (Dfd, Scr, Antp, Ubx, abd-A, Abd-B), Odonata (lab, pb, Hox3, Dfd, Scr, Antp, Ubx, abd-A, Abd-B), Plecoptera (Dfd, Scr, Antp, ftz, Ubx, Abd-B), and Dermaptera (Dfd, Scr) (amino acid alignments are shown in Fig. 1). These 37 new sequences (Table  S1) fill in crucial gaps both at the base of insects as well as at the base of Pterygota ( Table 2). The new data raise the number of insect orders with reported Hox cluster gene sequences from 8 to 14 and the number of known gene sequences in the matrix from 67 to 101. In these numbers we include sequences from the 8 Hox genes (lab, pb, Dfd, Scr, Antp, Ubx, abd-A, abd-B) as well as from the two homeotic genes, Hox3 (bicoid) and ftz, which are integrated in the insect Hox cluster (or clusters in the case of Diptera).

Homology Assignment and Database Extension
Homolog identification of the isolated Hox genes is widely, but not completely non-problematic. All assignments shown in Table 2 are the immediate assignments according to BLAST searches. As it has been shown that caution should be taken when using the best BLAST hit to infer gene homology [31][32][33], we performed phylogenetic analyses to further test the assignments of the newly isolated Hox genes. In these analyses (Neighbor-Joining, NJ) with published homeobox sequences the new homeobox sequences for lab, pb, Dfd, Ubx, abd-A, Abd-B group into the expected clades. The genes Scr, ftz, and Antp are generally problematic. Even their full length homeodomain sequences do not allow an unambiguous assignment in a standard distance analysis (Fig. S1). A Neighbour Joining analysis of only the six potentially unambiguous new homeobox sequences (lab, pb, Dfd, Ubx, Abd-B) groups all new fragments into the expected clades of homologs from other insects and thus confirms the results of NCBI Blast (Fig. 2), except for abd-A which appears paraphyletic. Based on the partial homeobox sequences we cannot unambiguously distinguish between Scr and Antp homeobox fragments in Diplura, Archaeognatha, Ephemeroptera, Odonata, Plecoptera and Dermaptera. The isolated homeobox fragments differ between orders, but no amino acid substitutions are found in the short fragment spanning homeodomain positions 20 to 45 (Fig. 3). For these gene fragments more sequence information is required to distinguish between the two alternatives, since amino acid substitutions have been known to occur at positions 1, 4, 6, 7 and 60 only (Fig. 3). We believe that we have amplified both genes (different homeobox sequences) but we are reluctant to suggest an assignment to the Scr or Antp gene family in the absence of unambiguous differences in the homeodomain. For odonates we have verified the correct assignment of the new gene fragments to their Scr, Antp, and Ubx gene families also by RACE-PCR, amplifying full length homeobox sequences for developmental studies (data will be reported elsewhere). The only Hox gene sequences previously isolated from Apterygota were from two orders, Thysanura and Collembola. The addition of 13 new sequences from Archaeognatha and Diplura doubles the number of apterygote insect orders with known Hox gene sequences. The Archaeognatha Hox gene sequences possibly present the best available roots for Hox genes in Hexapoda, allowing a reference point for estimations on the speed of sequence evolution of Hox genes in insects [34]. In general, the new data provide a starting point for phylogenetic and developmental studies investigating the apterygote-pterygote transition.
In the Pterygota Hox gene sequences have previously been known from the derived Hemiptera, Diptera, Hymenoptera, Orthoptera, Coleoptera (complete cluster) and Lepidoptera [21,35,36]. With 24 new sequences from Ephemeroptera, Odonata, Plecoptera, and Dermaptera we here add new sequences particularly from phylogenetically more basal insect orders (Table S1). These data are crucial for addressing the origin of pterygote insects, i.e. the invention and radiation of an insect bauplan armed with wings. Most recent molecular phylogenetic analyses suggest a basal position for Odonata within the Pterygota [37], making odonates particularly important for unraveling the evolutionary and developmental origin of insect wings [11,19,38]. We could isolate all 8 Hox genes for odonates as well as the homeotic gene Hox3 (bicoid). Only one other homeotic, but non-Hox gene, ftz, escaped our survey. Although we increased the number of pterygote insect orders with known Hox gene sequences from 6 to 10, there is still some 19 insect orders left for which no information on Hox gene sequences are available (see Table 2).
The main goal of our study was to add as many new Hox cluster gene sequences from phylogenetically particularly important insect orders to the database as possible. The primer pairs used in this study proved to be successful for all 10 Hox cluster genes, but they did not amplify all homeobox fragments from all insect orders investigated in this study. Filling these gaps will   require a different approach and possibly different primer sets. In contrast to previously used degenerate Hox primers our newly designed ''insect specific'' primers amplify significantly larger fragments (to almost full length homeoboxes), 120-164 instead of some 80 bp [39,40]. With respect to preparing the grounds for comparative studies on the evolution of the winged insect bauplan the genes Scr, Antp and Ubx are of immediate importance [5,19,41,42,43]. We have isolated fragments from all three genes from Archaeognatha, Ephemeroptera, Odonata, and Plecoptera. If Odonata should represent the most basal pterygote insects (see above) the new sequences from odonates will become indispensable for comparative studies on the evolution of Pterygota ( Fig. 4 and Table S1).

Insects Hox genes in Development and Evolution
From the very beginning of embryogenesis Hox genes control axes formation and the resulting body structuring in Bilateria (for controversial discussion on non-bilaterian animals see Kamm et al. [44]; Ryan et al. [45]; Schierwater et al. [16]; Schierwater and Kamm, [46] and refs. therein). Studies on model systems offered tremendous insights into the genetic principles of bilaterian development. Current EvoDevo research is urgently seeking comparative data from non-model animal systems, since most of    [56,57,58,59]. Macro-evolutionary events in insect evolution, which are cited as being major Bauplan transitions, are mapped on the phylogeny. Pictures are modified after [60]. doi:10.1371/journal.pone.0034682.g004 the established model systems are phylogenetically quite derived. If one wants to unravel the invention of wings in insects for example, a key bauplan change that has fueled the unchallenged radiation success of pterygote insects, comparative data from the base of Pterygota are indispensable (Fig. 4). From higher pterygote insects we know that Scr and Ubx play key roles for the development of wings [19,35,42,43,[47][48][49]. In the absence of comparative data from more basal pterygote insect orders, however, no conclusions on the role of Scr and Ubx for evolutionary origin of the insect wing can be drawn. The new sequences from several crucial insect orders provide a first step towards obtaining the missing data.
To what degree Hox genes can also directly contribute to phylogenetic analyses has been controversially discussed [28]. The genomic organization of Hox genes has supported several important clades at higher taxonomic levels [50,51,52]. At the sequence level of the homeobox or homeodomain one may also find phylogenetic signals at lower taxonomic levels [34,53]. The main limitation though relates to the shortness of the sequence while the main strength arises from the unproblematic alignment [54].

Insects Homeoboxes have Radiated Faster than Noninsect Homeobox Sequences
The addition of 37 new insect homeobox sequences allows to test the hypothesis that an increased radiation success correlates to an increased rate of sequence evolution in the regulatory Hox genes.
In 1965 Zuckerkandl and Pauling [55] suggested that mutations accumulate over time and that therefore the genetic divergence could be used to estimate the time of split between clades -the idea of the molecular clock was originated. We calculate the absolute rate of sequence evolution in the regulatory Hox genes. As fossils are the best estimates for the minimal age of a specific group, we have used the fossil records to estimate the sequence evolution rate (in % per million years) at which the different lineages have accumulated mutations in their Hox genes.
Comparison of p-distances within and between groups revealed a significant faster sequence evolution in the insects compared to other arthropods and mammals (Table 3). Average sequence evolution rate of Hox gene homeoboxes in insects is estimated as 0.06+/20.003% per million years (mean +/2 SE) and significantly higher than in non-insects (0.04+/20.02; p,0.001 U-test, 2-sided).
To interpret these sequence evolution rates we have to keep in mind two important aspects. First, the Hox gene homeobox sequences available for the different groups do not necessarily reflect the overall diversity in this group. For Bivalvia, sequences are available from 5 out of the 10 recognized orders, for Cephalopda from 3 out of 11, for Gastropoda from 4 out of 23, for Maxillipoda from 5 out of 14 and for Insecta from 11 out of 32. In addition homeobox sequences are not always complete which can lead to an underestimation of the overall p-distance. Secondly, p-distances do not take substitution rate biases, differences in evolutionary rates among sites or multiple substitutions at the same site into account. All of the above can lead to an underestimation of sequence evolution rates. Nevertheless, the outstanding high sequence evolution rate in insects supports the hypothesis that the unchallenged radiation success of insects, and particularly flying insects, coincides with an increased sequence evolution rate in the most important regulatory genes for the a-p bauplan setup, i.e. the Hox genes. Based on former experiences with short homeobox fragments the probability for misclassification should be low [39,40] and the latter should not contribute significantly to the observed high rates of sequence evolution in insects. Figure S1 Neighbor-Joining tree of all previously known Scr, ftz, and Antp sequences from those insect orders for which the complete set of Hox gene homeobox sequences is known: Folsomia candida (Colle), Drosophila melanogaster (Dipt) and Tribolium castaneum (Coleo). Even the full length homeobox sequences allow no unambiguous grouping (see text).