Molecular evolution of cytochrome C oxidase-I protein of insects living in Saudi Arabia

The study underpins barcode characterization of insect species collected from Saudi Arabia and explored functional constraints during evolution at the DNA and protein levels to expect the possible mechanisms of protein evolution in insects. Codon structure designated AT-biased insect barcode of the cytochrome C oxidase I (COI). In addition, the predicted 3D structure of COI protein indicated tyrosine in close proximity with the heme ligand, depicted substitution to phenylalanine in two Hymenopteran species. This change resulted in the loss of chemical bonding with the heme ligand. The estimated nucleotide substitution matrices in insect COI barcode generally showed a higher probability of transversion compared with the transition. Computations of codon-by-codon nonsynonymous substitutions in Hymenopteran and Hemipteran species indicated that almost half of the codons are under positive evolution. Nevertheless, codons of COI barcode of Coleoptera, Lepidoptera and Diptera are mostly under purifying selection. The results reinforce that codons in helices 2, 5 and 6 and those in loops 2–3 and 5–6 are mostly conserved and approach strong purifying selection. The overall results argue the possible evolutionary position of Hymenopteran species among those of other insects.


Introduction
The fact that many insect species are difficult to be discriminated at the morphological level, as well as the huge number of cryptic species, makes the global species count uncertain [1]. The adoption of DNA-based molecular markers represents a satisfactory alternative. Since the proposal of DNA barcoding in 2003, subunit I (658 bp) of the mitochondrial cytochrome C oxidase (COX) gene (namely COI) became the most universal marker for species identification in the animal kingdom [2,3]. The recently developed Barcode Index Number (BIN) system [4] ( Ratnasingham and Hebert, 2013) can act as a powerful alternative to morphological species a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 that easily distinguishes the occurrence of diversity or possible speciation [5,6,7,8,9,10,11,12]. Technically speaking, this system complements the molecular-based approaches of species identification to strengthen and support the evolutionary analysis in insects.
Documentation of insect diversity in the Sahara-Arabian region including Saudi Arabia has recently taken place [13] as the biodiversity data available for this region is insufficient, compared with those in Canada or the US. Documentation based on the criteria of the International Union for Conservation of Nature allows recognizing invasive alien species or list threatened species. Malaise traps were successfully used in scoring richness of insect species and biodiversity surveillance in regions that are difficult-to-access [14] and/or in the absence of formal taxonomic assignments [15].
COX enzyme functionally participates in the electron transport chain by reducing oxygen and pumping protons across the inner mitochondrial membrane. The encoded protein of COI comprising 219 amino acids (AAs) consists of six polypeptide chains and a few metallic ligands. The latter includes two iron atoms bound in two heme groups, three coppers, one zinc and one magnesium [16,17]. Changes in the AA sequence within the COI region likely reduce cellular energy metabolism, especially when changes occur close to the active sites of the enzyme [18]. Purifying selection predominantly occurs for animal mitochondrial protein-coding genes, thus, results in the scarce of AA substitutions, especially in the COX genes [19,20,21]. However, evidence of nonsynonymous AA substitution supports the notion of positive selection generally in animal and specifically in class Insecta. Due to the functional constraints criterion, changes should be biased to the nonfunctioning regions of the gene supporting the claim that evolution is not neutral [20,22]. Positive selection for changes in mitochondrial proteins during evolution could be caused by lifestyle changes, and this has been observed in snakes changing their metabolic rate to complement lifestyle [21].
In this work, we introduce a model study on the COI subunit of specimens of six insect orders collected from Saudi Arabia to generate DNA barcodes and detect variation among species and orders at the AA level. Based on the latter, changes in COI protein structure at the three-dimensional model are projected. Through this model study, we gained new insights into the possible mechanisms by which COI protein evolves in insects living in Saudi Arabia.

Materials and methods
For sample collection, a Malaise trap [23,24] (Hutcheson and Jones, 1999;Hill et al., 2005) was installed at Hada Al-Sham station, King Abdulaziz University (KAU) located in the western region (near Makkah) of Saudi Arabia (21.795 o N, 39.711 o E). Samples were collected on a weekly basis for four weeks (May 1-28, 2017) into 95% ethanol and stored at -20˚C for further analysis. Specimens were morphologically classified down to the species level, then total DNAs were recovered from individual tissue samples according to Evans and Paulay [25]. PCR was performed to recover the COI gene fragment (650 bp) as described by Ashfaq et al. [13] and amplicons were shipped to Macrogen (South Korea) for Sanger sequencing. Recovered sequences were checked for quality and those meeting the standard criteria were further assigned to subsequent analysis. Good quality sequences were trimmed and algorithmically aligned with Clustal Omega with strict parameters and the resulting alignments were manually refined before translation. Deduced AA sequences were recovered consulting the invertebrate mitochondrial code. Samples with nucleotide deletions were excluded upon DNA multisequence alignment, and only in-frame AA sequences were retained. A consensus sequence of Odonata (accession no. JN294479) was used as a reference for comparison. This order is the most ancient in the hexapoda insect phenogram [26]. Binary data matrices were entered into TFPGA (version 1.3) and analyzed using qualitative routine to generate a similarity coefficient.
Dissimilarity coefficients were used to construct dendrograms using unweighted pair group method with arithmetic average (UPGMA) and sequential hierarchical and nested clustering (Neighbor-Joining or NJ) routine using NTSYSpc (version 2.10, Exeter software). Phylogeny tree generated by Bybee et al. [26], was taken as a reference ancestral order for testing insect lineage and evolution.
The AA sequence of cattle (Bos taurus) was used as a reference [27] for detecting changes of COI protein of selected species at the 3D structure level. We recruited the bovine protein Xray structure (Protein Data Bank ID 1OCC) as a homology model of the COI barcode region to build 3D structures of selected insect species using the I-TASSER Suite [28]. Distance measurements between AAs, on one side, and the two heme ligands referenced as 515 and 516, on the other side, were estimated using UCSF Chimera 1.13.1 [29]. Electrostatic potential of the COI protein was estimated using DeepView-Swiss-PdbViewer (v4.1) (http://www.expasy. org/spdbv) Nucleotide substitution patterns, e.g., synonymous (S) and nonsynonymous (N), were generated using the Estimate Substitution Matrix feature in MEGA v. 6 and estimates of the numbers of S and N substitutions per site were made using the joint Maximum Likelihood reconstructions of ancestral states [30] of codon substitution and Felsenstein model [31] of nucleotide substitution. Changes in AA sequences were scored referring to the consensus sequence of Odonata.

Results and discussion
COI barcode was analyzed at the DNA and protein levels for insects living in Saudi Arabia in order to gain new insights as to how this gene and its encoded protein evolve among different insect orders. A number of 560 samples were collected by the Malaise trap within a period of four weeks. Photographs were taken for all specimens, while the number was eventually narrowed to one photograph per species as shown in S1 Fig. DNAs were purified from these samples and the recovered mitochondrial COI gene fragments were sequenced. We projected to include only sequences with � 634 nucleotides encoding all loops and helices of the COI protein. The number of deduced AAs is 211 starting at codon position 12 referring to the numbering made by Pentinsaari et al. [32]. Based on the rigid quality control criterion, the number of samples was narrowed to 175 for further analysis. The consensus DNA sequence of Odonata was included for comparison as this order has an ancient phylogenetic position being the representative of the first ancient winged insects in the hexapoda phylogeny [26]. This order has extensively been used as a reference in contemporary evolutionary genomic studies [33]. BLAST analysis indicated that a number of 30 species representing six insect orders were collected during the course of this study. Two of which belong to Hemimetabolous, e.g., Blattodea (one species) and Hemiptera (five species), while four belong to Holometabolous, e.g., Hymenoptera (six species), Coleoptera (six species), Lepidoptera (five species) and Diptera (seven species). One of the Hymenopteran species was not precisely identified at both the morphological and molecular levels, while showed the closest relationship (98%) with Apoidea sp.
Multiple DNA sequence alignment indicated that a number of 265 nucleotides out of 634 are common (conserved) in the recovered COI barcode, while the rest varied among orders and species as shown in S2 Fig. A phylogenetic tree constructed from the sequences indicated the close relationship between the species of Odonata and Blattodea as both orders are hemimetabolous (S3 Fig). Recent research indicated that Blattodea is the closest among the six orders of the present study to Odonata (Fig 1) [26]. Interestingly, a close relationship was shown between species of Hemiptera and Hymenoptera although the first is hemimetabolous, As it is the case with animal mitochondrial DNA [32], the DNA COI barcode sequences analyzed for the six orders is AT-biased (e.g., AT content =~68%) as shown in Fig 1. The AT content was lowest (~57%) for the nucleotides in the first position of all codons, while~58% in the second position and as high as~90% in the third position. The G nucleotide has proven to be the least degenerate among the four nucleotides in the third position. Interestingly, A nucleotide in the second position was unexpectedly low (<15%), while C was high (~25%). No G nucleotide was scored in the third position for four species namely Hymenoptera sp., Belenois aurota, Asyndetus sp. and Carpomya vesuviana. The first two species are Hymenoperan and Lepidopteran, respectively, while the other two are Dipteran. The lowest C nucleotide in the third position was scored for the two Hymenoptran species Tachytes crassus (<2%) and Nomioides facilis (<1%) (Fig 1).
Amino acid sequences of COI barcode were sorted out in this study based on their chemical properties into standard groups: nonpolar aliphatic (G, A, V, L, M and I), polar uncharged (S, T, C, P, N and Q), aromatic (F, Y and W), positively charged (K, R and H) and negatively charged (D and E). The barcode sequences of different species largely encode nonpolar AAs (Fig 2). There are differences in the AA composition of the barcode sequences among orders. Alanine (A) and valine (V) are lower in barcodes of Hymenopteran species as well as of the Hemipteran species Batracomorphus angustatus compared with those of the other orders. On the other hand, lysine (K) is exclusively encoded by barcodes of the latter species although there are very few. Four species of the latter two orders) Hymenoptera and Hemiptera) also exclusively encode cysteine (C). The two AAs K and C do not exist in the sequence of cattle COI barcode [32]. We speculate that they appeared due to AA substitutions in several positions of the animal AA sequence of the barcode of Batracomorphus angustatus. Three other AAs seem to be encoded in very small amounts in few orders such as glutamine (Q), tyrosine (Y) and glutamic acid (E). Similar results were previously reached by Pentinsaari et al. [32] when studying Coleopteran and Lepidopteran species. The scarcity of these AAs in barcodes can possibly be compensated by AAs of the same chemical groups at the same position in the polypeptide chain. For example, tyrosine (Y) exists in three positions, e.g., 38, 113 and 215. The position in the middle is conserved among the six orders, while substituted to phenylalanine (F) in the other two positions. The two AAs exist in the same chemical group (aromatic). Interestingly, the five rare AAs (C, K, Q, Y, and E) are encoded by twofold degenerate codons. Cysteine in the COI region exists in four species of Hymenoptera and one species in Hemiptera although in very small amounts, while completely absent in other species of the six orders. Changes in cysteine in the barcode sequence can severely affect the secondary structure as well as the stabilization of tertiary and quaternary structures of the COI protein.
Multiple AA sequence alignment of the COI barcode of the 30 species is shown in S4 Fig. The encoded AA sequences of the DNA barcode fragment across the six orders cover 211 AAs starting from position 12 to 222 based on the numbering made by Pentinsaari et al. [32]. This portion of the barcode includes the enzymatically active part of COX mediating electron transfer from Cu to heme. Amino acid sequence variation in each position was scored in which we referred to conserved AAs by asterisks while increasing variability by reducing the number from 9 to 1 (S4 Fig). Based on the criterion followed by Pentinsaari et al. (2016), the substitution of AAs with the same chemical property at any given position is not considered to significantly influence enzyme function. This also depicts the substitution of AAs from one chemical group to the other in a position far from the enzyme ligands. However, substitutions that change the AA chemical group occur close to the enzyme ligands, may likely influence enzyme function. As indicated earlier, the secondary structure of the barcode region includes six α-helices connected by five loops (see Fig 3 in [32]). The loops encompass 62 AAs of which all the eight-plus 11 AAs of loops 1-2 and 3-4, respectively, vary among the six orders. AAs in loop 2-3 vary in three out of the eight AAs, while 10 out of 23 and five of 12 for loops 3-4 and 5-6, respectively. The loop 3-4 is important for pointing towards the heme group at the active site of the COI protein. The higher number of conserved AAs in this study was expected due to the narrow genetic distances among the six insect orders compared with those shown by Pentinsaari et al. [32] across the Metazoan barcodes. The number of the conserved AAs of the barcode sequence among the six insect order is 102 including the known 23 conserved AAs  among Metazoan [32]. Overall, AA variations occurring at positions in the helices 3, 4 and 5 likely have less influence on enzyme function compared with that in the helices 1 or 2. Besides, AA variations are more pronounced in loops than in helices.
Functional constraints during the evolution of the insect COI barcode was checked in the predicted three-dimensional structure in terms of the distances between AAs and the two heme ligands referred to as hemes 515 and 516 (Fig 3). The analysis indicated a number of 16 AAs mostly in close proximity with heme 515 (Fig 4A). These AAs exist in helices 2 (10) and 1 (4) and  rest are conserved. We selected Batrachedra amydraula as a model barcode of the insect conserved sequences for the 16 AAs in the heme 515 vicinity in which the predicted 3D structure was generated (Fig 3). We mainly focused on four AAs that directly share six bonds with hemes 515 and 516. These AAs are R (position 22), Y (position 38), H (position 45) and W (position 110). Two bonds exist between H at position 45 and heme 515 ( Fig 4B) and two other bonds exist between W and the two heme ligands 515 and 516 (Fig 3C). ) that unexpectedly showed substitution to phenylalanine in two Hymenopteran species, e.g, Camponotus maculatus and Monomorium junodi. These two AAs differ in the occurrence of an OH molecule at the side chain of tyrosine, which is lacking in phenylalanine. The predicted 3D structure of Camponotus maculatus was investigated at this position in order to detect the possible severity accompanying this substitution ( Fig 3D). Tyrosine and phenylalanine share the same chemical group, however, the structure indicates no chemical bonding between the AA and heme 515 albeit the possible occurrence of Van der Waals interaction. This is the result of polarity change due to substitution that increases the chance of splitting heme from the protein. We speculate that this change in the 3D structure of COI and its consequence can impair the process of electron transfer from Cu to heme, thus, potentially affect energy metabolism in the mitochondria. Fig 3 depicts the alignment of the predicted three-dimensional structures of COI proteins of Batrachedra amydraula and Camponotus maculatus. The figure indicates the occurrence of a hydrogen bond between tyrosine of the first insect at position 38 and Heme 515 while lacking bond between phenylalanine of the second insect at the same position (Fig 4A and 4B). The figure also underpins poor alignment of loop 1-2 or 4-5 of the two insect species (Fig 3C), which reflects the high rate of varied AA in these two regions in line with the results of Pentinsaari et al. [32] and those of the present study (S4 Fig). The three-dimensional structure of the protein also indicates that the change from tyrosine to phenylalanine resulted in the loss of bonding with heme, thus likely affect energy metabolism. The phenylalanine appears to form a proper geometric complementarity; however, lacks electrostatic complementarity due to the absence of the hydroxyl group that contributes to the electrostatic interaction with the heme (Fig 4A), resulting in an increase in hydrophobicity. This suggests that the protein retains its structure, but not all of its functions. Moreover, the absence of a single hydrogen bond, computing the electrostatic potential showed no clear difference when tyrosine is substituted with phenylalanine ( S5 Fig). Phylogenetic tree based on multiple AA sequence alignment of COI barcode indicated that Hymenoptera is the most genetically distant of all the six insect orders under study (Fig 5). The tree structure complements that of the tree generated based on multiple DNA sequence alignment (S2 Fig) in which Blattodea was shown to be closest to Odonata. The same criterion applies to the relatedness between the two orders Hemiptera and Hymenoptera and the two other orders Coleoptera and Diptera. We further analyzed the six varied AAs at positions 15,21,38,41,42 and 57, existing within the distance of < 4Å from heme 515, in terms of transitions from one biochemical group to the other (Fig 5). At position 15 of the COI protein, threonine is changed to either serine with the same chemical group (polar uncharged) or methionine (two-codon AA) with different chemical groups (nonpolar) in all species of Hymenoptera and Hemiptera, respectively. At position 21, the substitution of isoleucine to valine in the same chemical group (nonpolar) has taken place in only Melanotus villosus species. At position 38, the substitution of tyrosine to phenylalanine in the same chemical group has taken place in the two Hymenopteran species Monomorium junodi and Camponotus maculatus. At position 41, the substitution of isoleucine (two-codon AA) to either leucine (six-codon AA) or methionine (two-codon AA) in the same chemical group (nonpolar) has taken place in Hymenopteran species except for Camponotus maculatus. An extra substitution at the same position has taken place to valine also in the same chemical group (nonpolar). At position 42, the substitution of valine to isoleucine in the same chemical group has taken place in Hymenopteran species, which was not fully detected at the species level. Interestingly, substitution at position 57 from isoleucine (nonpolar) to phenylalanine in a different chemical group (aromatic) has taken place in all Hymenopteran species. The AA transitions from one chemical group to the other might result in impaired energy metabolism. The study indicated a small distance between Hymenoptera and Hemiptera in the phylogenetic tree than between Coleoptera and Hemiptera. The data of AA substitution support our speculation that Hymenoptera is the most genetically distant of all the other insect orders under study. As Hymenopteran species are shown closely related to Hemipteran species and the fact that Hymenopteran species showed AA substitutions in five out of the six varied AAs in close proximity with the heme ligand indicate that Hymenoptera might be among the oldest insect orders.
The estimated nucleotide substitution matrices in the COI barcode of the six orders generally showed a slightly higher probability of transversion compared with transition (Fig 6). The probability of transversion ranged between~40% for the Coleopteran species Melanotus villosus and~65% for the Hymenopteran species Tachytes crassus. Substitutions of A followed by T to any other nucleotide represents the lowest probability compared with the rest. Changes from C to T followed by G to A showed the highest transition frequency compared with the others. These two high transition frequencies justify the high TA ratio in insect orders and the bias against G or C during evolution. Both Coleoptera and Diptera showed the most notable bias against G.
Maximum Likelihood computations of codon-by-codon synonymous (s) and nonsynonymous (n) substitutions in COI protein were conducted using the HyPhy software package in which Odonata was used as the reference ancestral order (Fig 7). The total number of nonsy- The study indicates that COI barcodes can provide insights by which the encoded protein evolve and function in insect at different taxonomic scales. The COI or Folmer region is basically used for species identification based on variations at the DNA level [34]. The region is sufficiently conserved within species, while variable among species. This region is located in the enzymatically active part of COI, thus involved in the respiratory chain by transferring electrons from Cu to heme. Accordingly, this region is under functional constraints and mutations in close proximity with heme ought to be lethal. There is less evolutionary pressure against variation in positions within the extra-membranous loop structures (e.g., loops 1-2 at AA positions 27-34, 3-4 at AA positions 102-124 and 4-5 at AA positions 156-166). This conclusion was previously reached by Panchenko et al. [35]. Amino acids in the two highly conserved loops (e.g., loops 2-3 and 5-6) seem important in holding the two sides of the enzyme and avoid severe conformational changes. However, tryptophan within the variable loop 3-4 at position 99 rigidly evolves as it is the only AA of COI protein that bonds with the two hemes. Another AA in positions close to the active site is rigid membrane-embedded αhelices undergo functional constraints and restricted evolution (negative evolution). A similar pattern of variation was observed in the protein encoded by mitochondrial ribosomal RNA gene, where the loops freely evolve [36].
The study generally indicated that almost half the positions of the AA of the barcode region are variable, however, they are mostly beyond the atomic interaction distance of the heme groups as previously reported [27,16,37]. Six of these AAs exist within a distance of >4Å with four of them changed to AAs of the same chemical groups. However, the three-dimensional structure of the protein indicates that the change from tyrosine to phenylalanine resulted in the loss of bonding with heme, thus likely affect energy metabolism.
Estimates of dN-dS were used for detecting codons under either positive or purifying selection. dS is the number of synonymous substitutions per site (s/S) and dN is the number of nonsynonymous substitutions per site (n/N). Positive values indicate the overabundance of nonsynonymous substitutions in these positions. During evolution, positive (Darwinian) selection promotes the sweeps of new beneficial alleles, and negative (or purifying) selection impedes the spread of harmful alleles [38]. The results indicated that 102 codons are under positive selection and tend to evolve more rapidly.
Phenotypic plasticity or polyphenism is another potential player of evolutionary changes and shaping of ecosystems that allows species to phenotypically adapt to different environmental conditions during evolution [39,40,41,42]. This type of evolutionary force is not genome structure-based, rather, it relies on the differential gene expression (gene expression-biased) and is particularly prominent in insects. Gene expression bias is predominant in molecular evolution when selection becomes less effective at removing deleterious alleles. Studying polyphenism provides insight into the molecular basis of phenotypic differentiation during evolution [43,44,45,46,47,48,49,50,51]. This approach is ideal especially for cases where low genetic differences exist among species [52,53].
Hymenopteran species have the highest AA variation in line with the presumed age of the order that is likely among the oldest, thus had a long time to molecularly evolve by changing AA sequence. Previous reports speculated that differences in patterns of variation, substitution, and selection among insect orders are due to the number of metabolism-related factors that are high in actively flying insect species (like Hymenoptera) than in non-flying species [54]. Selection is biased towards the higher active metabolic rate and the increase of the resting metabolic rate. Pentinsaari et al. [32] indicated a higher relative rate of CT/GA transitions in the insect is the result of higher oxidative damage to DNA. This high variation is the result of a weak purifying selection in Hymenopteran species. We strongly recommend making wet laboratory analysis to support whether the changes in AAs consequently affect metabolism in the target species or not. The overall results argue the possible evolutionary position of Hymenopteran species among those of other insect orders living in Saudi Arabia, especially Hemiptera and Coleoptera.