Positive and Purifying Selection Influence the Evolution of Doublesex in the Anastrepha fraterculus Species Group

The gene doublesex (dsx) is considered to be under strong selective constraint along its evolutionary history because of its central role in somatic sex differentiation in insects. However, previous studies of dsx used global estimates of evolutionary rates to investigate its molecular evolution, which potentially miss signals of adaptive changes in generally conserved genes. In this work, we investigated the molecular evolution of dsx in the Anastrepha fraterculus species group (Diptera, Tephritidae), and test the hypothesis that this gene evolved solely by purifying selection using divergence-based and population-based methods. In the first approach, we compared sequences from Anastrepha and other Tephritidae with other Muscomorpha species, analyzed variation in nonsynonymous to synonymous rate ratios (dN/dS) in the Tephritidae, and investigated radical and conservative changes in amino acid physicochemical properties. We show a general selective constraint on dsx, but with signs of positive selection mainly in the common region. Such changes were localized in alpha-helices previously reported to be involved in dimer formation in the OD2 domain and near the C-terminal of the OD1 domain. In the population-based approach, we amplified a region of 540 bp that spanned almost all of the region common to both sexes from 32 different sites in Brazil. We investigated patterns of selection using neutrality tests based on the frequency spectrum and locations of synonymous and nonsynonymous mutations in a haplotype network. As in the divergence-based approach, these analyses showed that dsx has evolved under an overall selective constraint, but with some events of positive selection. In contrast to previous studies, our analyses indicate that even though dsx has indeed evolved as a conserved gene, the common region of dsx has also experienced bouts of positive selection, perhaps driven by sexual selection, during its evolution.


Introduction
Even though several genes involved with premeiotic aspects of reproduction are exceptionally conserved throughout evolution [1], many genes related to reproduction in animals and plants are among the most divergent, evolving faster than those not related to sexual traits [2], possibly because of sexual selection and/or sexual conflict [3,4]. In mammals, for example, many proteins related to fertilization are rapidly evolving [5]. In the same way, selfincompatibility genes in plants [6], gamete-recognition proteins from marine gastropods [7] and male and female reproductive proteins from Drosophila [8,9] have shown patterns of rapid diversification. Most molecular studies so far have concentrated on genes involved in fertilization or male-female interaction and only a few on genes responsible for sexual differentiation. The gene transformer (tra), one of the central genes in the sexual differentiation cascade in Diptera [10,11], fruitless from Anastrepha [12], and the complementary sex determiner gene (csd) from the honey bee cascade [13] are some of the examples of sex-determining genes whose molecular evolution have been studied.
The gene doublesex (dsx) plays a central role in the sex determination cascade. DSX is a transcription factor that acts by activating or repressing genes downstream in the cascade that are responsible for the development of male or female traits in insects [14]. This gene is functionally conserved among insects and other animals and, differently from other genes involved with sex determination [15][16][17], it occupies a conserved position at the bottom of the cascade [18,19]. Notwithstanding its functional conservation, dsx varies both in its genomic organization and splicing pattern among insects [15,20,21]. In Anastrepha, dsx is composed of four exons: the first two are common to both sexes, the third exon is specifically female-expressed and the fourth is male-specific [22]. The common region contains a DNA-binding/ dimerization domain (DM/OD1) at exon 1 and a second dimerization domain (OD2) at exon 2. Additionally, the OD2 spans the 59-end of both female and male exons [22].The DNAbinding domain was described as a novel class of zinc-finger DM motif, which is structurally conserved among metazoans [14] and contains another oligomerization domain (OD1). Because of functional and structural conservation of the DM/OD1 motif and the central role dsx plays in sexual differentiation, this gene is expected to have a direct influence on fitness and should be, in general, reasonably conserved. On the other hand, dsx may be subject to sexual selection since it is also involved in many aspects of reproduction and sexual behavior [23] and, as a consequence, could show signs of positive selection. Some studies performed in insects have shown a pattern of evolution for dsx which is compatible with the conservation of its role in sex determination across insects [18,19,24]. More recently, an analysis of the evolution of dsx in species of the genus Anastrepha suggested that this gene evolved primarily under purifying selection [25]. However, global estimates of evolutionary rates in coding regions have low power to detect signals of positively selected changes [26], and it is possible that even though the gene as a whole shows signs of purifying selection, different parts of the gene may behave differently. So, we investigated patterns of molecular adaptation of dsx in the Anastrepha fraterculus species group by testing the hypothesis that this gene is evolutionarily conserved and evolved solely under purifying selection. To do so, we used different approaches to assess long-term variation in nonsynonymous to synonymous rate ratio (dN/dS), amino acid property changes, and approaches based on population data used to study dsx evolution. Contrary to other studies [25,27], our results indicate that even though dsx has indeed evolved as a conserved gene, some gene regions have experienced bouts of positive selection.

Materials and Methods
Analysis of selection on dsx was performed with a hierarchical strategy. Initially, we evaluated evolutionary patterns using divergence-based methods with sequences from different species of Anastrepha and other Muscomorpha available on GenBank. In this context, we analyzed the data considering variation in nonsynonymous/synonymous ratios (dN/dS = v) and changes in amino acid properties. We also wanted to test whether the same selective pattern would be detected at the population level. Thus, we sequenced a portion of the gene dsx in several individuals of Anastrepha from the fraterculus species group and performed different neutrality tests based both on the site frequency spectrum and the haplotype network topology.
Because dsx undergoes sex-specific alternative splicing, the protein is composed of a common region and sex-specific domains. The common region contains two conserved domains, OD1/DM and OD2, responsible for DNA-binding and protein dimerization, respectively. Considering such a complex organization and the possibility that each part of the gene may have different selective histories, we performed each analysis of selection separately: for analyses considering variation in v and radical changes in physicochemical properties, the subsets were organized into female and male isoforms. In the comparison of evolutionary rates among different segments of dsx, we established four different subsets each representing a specific contrast. First, we inferred a phylogenetic tree for each subset by maximum likelihood using MEGA ver. 5 [29] using the nucleotide substitution estimated in ModelTest [30] implemented in HyPhy [31]. Considering that saturation of synonymous substitutions could underestimate dS rates, which might lead to overestimated v ratios, we checked for saturation in synonymous and nonsynonymous substitutions by plotting proportions of synonymous and nonsynonymous nucleotide differences per synonymous and nonsynonymous sites, pS and pN respectively, against sequence divergence for pairs of taxa. Sequence divergences, pS and pN were estimated using the modified Nei-Gojobori method, with transition/transversion bias equal 2, implemented in MEGA ver. 5 [29]. Because recombination interferes with phylogenetic inferences, we performed three different methods to detect recombination events: the Maximum Chisquare [32] was performed in MaxChi, GENECONV [33], and RDP was implemented in RDP version 3b14 [34]. A cutoff limit of p,0.05 was set for the three tests and, when applicable, a randomization of 1000 replicates was performed.

Positive selection detection by divergence-based methods: changes in dN/dS
We performed the branch-site test [35], which infers positive selection considering v variation in branches and sites, using CODEML, implemented in PAML ver. 4 [36]. A v.1 indicates positive selection because nonsynonymous substitutions have higher fixation probabilities than synonymous mutations due to selective advantages. On the other hand, a v,1 indicates purifying selection caused by selective constraints at codons. For this test, a phylogenetic tree was separated in a foreground branch composed of the lineage of interest, and in background branches, represented by the other lineages [35]. In so doing, these tests evaluate whether patterns of selection in the foreground branches are significantly distinct from the patterns of the background branches, which would indicate that different patterns of selection have affected specific evolutionary lineages [37].
Positive selection was inferred by the contrast of distinct models using a maximum likelihood approach. We used an alternative model (MA), which constrains the codons in the background branches to 0,v,1 or v = 1 (v 0 and v 1 site classes, respectively) and allows the codons in the foreground to have v.1 (v 2 site class) in addition to the other v 0 and v 1 site classes. The null model (MA null ) sets up the same v site classes of the alternative model MA, except that the v 2 site class in the foreground is fixed to 1. Positive selection was inferred with a significant log likelihood ratio test (LRT) of the alternative versus the null model. We used a chisquared null distribution with 1 degree of freedom to assess significance of the LRT.
The signature of positive selection in individual codons was estimated by Bayesian analysis, in which v was allowed to vary among sites. The Bayes Empirical Bayes method [37] was used in conjunction to the branch-site test to estimate which sites were under the influence of positive selection. Because some models are prone to problems of convergence in a likelihood framework, we ran the analyses twice with different initial v values. These tests were performed in the subsets described above.

Contrasting evolutionary rates among different regions in dsx
In order to compare the evolutionary rates among different segments of dsx, we established several different contrasts. For every contrast, we partitioned the molecule in two different regions to compare their evolutionary rates. We initially contrasted the common region to the male-specific and female-specific exons. We further contrasted the male-specific to the female-specific exon. Finally, we contrasted the evolutionary rates of the DM/OD1 and OD2 domains to the rest of the common region, which we defined as variable common region. For all data sets we tested the heterogeneity in evolutionary rates among each partition with the fixed-sites models implemented in PAML [38].
Variation in site heterogeneity was first assessed by contrasting the null model A to the alternative model B to test whether the regions had the same substitution rates. In model A, both partitions were treated as a single data set, with equal substitution patterns, equilibrium codon frequencies (p), transition/transversion rate ratio (k), and nonsynonymous to synonymous rate ratio (v). Model B separated each partition and estimated proportional branch lengths for each one, but estimated the same p, k and v for both partitions. This was followed by contrasting null model C versus alternative model E. Model C allows the partitions to have different branch lengths and p, but estimates the same k and v for both partitions, while model E allows each partition to have independent substitution rates. Then, the contrast between model C vs E tests for differences in k and v between partitions.
Models were contrasted using likelihood ratio tests (LRT) in which the degrees of freedom in each contrast were obtained by the difference in the number of parameters (1 and 2 degrees of freedom for the contrasts A vs B and C vs E, respectively). Because in the fixed-sites models v is simultaneously estimated with k, we cannot unambiguously determine if a significant LRT in the contrasts of models C vs E was due to k, v or both. Then, independent k and v values for these two regions obtained in model E were compared to determine if both parameters substantially differed between partitions.

Positive selection detection by divergence-based methods: changes in physicochemical properties
We also analyzed the evolution of dsx sequences by studying changes in amino acid properties through evolutionary time with the MM01 method [39] which considers changes in many physicochemical properties caused by each nonsynonymous substitution. We carried out these analyses in TreeSAAP ver 3.2 [40] separately for the complete male and female isoforms using each respective maximum likelihood phylogenetic tree. In this analysis, global deviation from neutrality was initially verified by a goodness-of-fit test between a neutral expected distribution and the observed distribution of changes in selected physicochemical properties. The magnitude of nonsynonymous changes was then classified into three categories according to the change in specific physicochemical properties, from conservative (category 1) to very radical substitutions (category 3). For each category, a z-score was calculated, which was compared with a mean among all categories. Significant positive z-scores showed that the number of inferred amino acid replacements significantly exceeds the number expected by chance, thus indicating that the region is under the influence of positive selection. Significant negative z-scores indicated a signal of purifying selection since the inferred nonsynonymous changes were significantly less frequent than expected by chance. When a significant positive z-score was detected in the more conservative category 1, the property was considered to be under stabilizing selection. Conversely, the property was considered to be under destabilizing selection when a significant positive z-score was detected in the more radical category 3. McClellan et al. [39] defined stabilizing selection that tends to maintain the original biochemical attributes of the protein despite the inference of positive selection, and destabilizing selection as that which favors structural and functional shifts in a region of a protein. In this way, positive-destabilizing selection represents a signature of molecular adaptation.
We analyzed 51 amino acid physicochemical properties and considered only 3 categories of magnitude changes despite the 8 categories originally defined [39] to avoid false positives, as suggested by McClellan and Ellison [41]. Only amino acid properties identified by significant positive z-scores in category 3 were considered to be affected by positive-destabilizing selection. To verify which specific regions were affected by positivedestabilizing selection, we performed a sliding window analysis using only the amino acid properties that were significant for positive-destabilizing selection. Sliding windows of 20 codons with a sliding step of one codon were selected for showing the best accuracy [41,42].

Diversity and population level analysis of doublesex
We sampled 51 individuals of A. fraterculus, A. obliqua and A. sororcula from 32 localities in Brazil ( Figure 1 and Table S1). DNA extraction was performed using a modified procedure of Nelson and Krawetz [43] to ensure that the exoskeleton remained intact for future morphological analyses. We amplified a region of 540 bp that spanned almost all of the exon common to both sexes (bases 34 to 574) and contains the DM/OD1 domain using degenerate primers created from homologous sequences of closely related species (59-ATGGTTTCNGAGGATAATTG -39 and 59-GCGNCCNACNACYGANATNGGCAA -39). This region was PCR amplified from genomic DNA in a BioRad PTC-200 thermocycler using a mixture of Taq polymerase and Pfu polymerase to reduce incorporating errors [44]. PCR products were purified by PEG 8000 precipitation [45] and cloned using the InsTAclone kit (Fermentas). At least two recombinant colonies were sequenced per individual with forward and reverse M13 primers using the DYEnamic TM ET dye terminator kit (GE Healthcare) and resolved in a MegaBace 1000 (GE Healthcare). Quality of base-calling was visually inspected in Chromas version 2.31 (http://www.technelysium.com.au). Because these sequences were very similar, they were aligned and visually inspected using BioEdit Sequence Alignment Editor [28]. When equal or very similar sequences differing by 1-2 bases were obtained from clones of the same individual, additional sequencing was performed on different recombinant colonies (up to five total) to confirm eventual homozygosity of individuals or sequencing errors. All sequences were deposited in GenBank (Table S2). Because recombination interferes with phylogenetic inferences, we performed the same tests for recombination used in the divergence-based analyses.
Population level analyses were performed on this 540 bp fragment from the common region. We estimated haplotype (Hd) and nucleotide diversity (p) [46], number of polymorphic sites, synonymous nucleotide diversity (p s ), nonsynonymous nucleotide diversity (p a ) and Watterson's h using DnaSP ver. 4 as an outgroup. Similar to the analysis of the long-term dataset, we separated this region in a conserved common subset, composed only of the DM/OD1 domain and a variable common subset, composed of the common region without the DM/OD1 domain. This subdivision was also used in further selection tests based on haplotype networks. Therefore, diversity indexes and neutrality tests were estimated for these two subsets separately, as well as for them combined (the whole region amplified). The significance levels of all neutrality tests and comparisons of diversity indexes were established by the use of a Dunn-Šidàk correction [51] for multiple test with the Holm's step-down algorithm [52].

Selection tests based on an intraspecific haplotype network
Departures from neutrality were also tested using the distribution of synonymous and nonsynonymous substitutions in tip and interior haplotypes (young and old haplotypes, respectively) in a haplotype network using Fisher's Exact Test on a 262 contingency table [53]. This is a more refined version of the McDonald and Kreitman's test (MKA) [54] used to study intra and interspecific polymorphisms. The haplotype network used in this test was inferred by statistical parsimony using TCS 1.21 [55]. We expected under neutrality that the ratio of nonsynonymous to synonymous mutations in tip haplotypes (younger haplotypes) should be the same as the ratio of nonsynonymous to synonymous mutations in interior haplotypes (older haplotypes). Under purifying selection, the nonsynonymous to synonymous ratio is expected to be lower in interior than in tip haplotypes. On the other hand, when recurrent events of directional positive selection or balancing selection occur on the gene, we expect the nonsynonymous to synonymous ratio to be greater in interior than in tip haplotypes.
We also performed this test contrasting the fate of synonymous and nonsynonymous mutations in conserved (DM/OD1) and variable segments of the common region. The complete test considered synonymous and nonsynonymous mutations in tip and interior haplotypes. In this analysis, the null hypothesis of homogeneity of data was simulated by 1000 random permutations and the test of hypothesis was conducted with an exact permutation test using the algorithm of Roff and Bentzen [56].
We used an additional test for selection which evaluated whether synonymous substitutions have a greater or smaller probability of surviving in the population when compared to nonsynonymous substitutions [12]. In this test, the survival of a mutation is evaluated by the frequency of haplotypes derived from it. The older the haplotype, the greater the probability it will be more frequent [57] and the greater the chance to be internal in an intraspecific haplotype network [58]. It is, then, reasonable to apply these same expectations to the mutations that gave rise to these haplotypes. In this way, we ranked the number of haplotypes in the network derived from internal synonymous and nonsynonymous mutations, and compared the sum of the ranks through an improved normal approximation to Mann-Whitney's U test [59] to establish a comparison between their survival probabilities.
In a neutrally evolving region, we expected the same survival probability through time for synonymous and nonsynonymous mutations, so we did not expect to see a significant difference in their ranks. On the other hand, if the region is under purifying selection, nonsynonymous mutations would have a higher chance of being eliminated and synonymous of being maintained. Therefore, in this case we expected that synonymous mutations would be on average older, and as a consequence would be more frequent, and then, would have significant higher ranks than nonsynonymous mutations. The opposite would be expected if balancing selection or recurrent positive directional selection have occurred on the selected region [60][61][62]. Because recent mutations, such as those in the tips of the network and with frequency of one or two derived haplotypes, may not yet have passed through the evolutionary test of survival and reproduction over time [53], and may be much more affected by drift, we only considered mutations present in at least three haplotypes.

Long-term evolutionary analysis of doublesex
Contrasts between the MA and MA null models of branch-site test considering the male and female isoforms, performed in the framework of the inferred ML phylogenetic trees depicted in Figure 2, confirmed that part of the differentiation that occurred in dsx was due to positive selection ( Table 1). The signals of positive selection were particularly evident in sites outside the DM/OD1 and OD2 domains, as indicated by the Bayes Empirical Bayes analyses ( Figure 3). Because we are contrasting well-diverged lineages, the inferences of v would be biased due to synonymous saturation. However, these inferences based on elevated v rates are warranted, since these contrasts are performed among lineages that show at most 0.07 substitutions per nucleotide site, which is lower than the point in which we began detecting significant synonymous saturation (0.25 substitutions per nucleotide site) ( Figure S1).

Contrast between evolutionary rates in different segments of dsx
Even though the contrast of model A vs B failed to show significant differences in their substitution rates when we compared the common region to the male-specific exon, models B and C were rejected in the contrasts B vs D and C vs E, respectively (Table 2), which may have been caused by significant  Table 1. Parameters estimates and log-likelihood for models MA and MA null and contrast between models MA and MA null by likelihood ratio test (LRT). differences in k and/or v. Because v was practically the same between the two regions, whereas k was clearly the most divergent, the significance of the contrasts was probably due to a difference in the rates of transitions to transversions between male-specific and common exons. When we compared the female exon to the common region, only the contrast of models A vs B was significant, showing a 3times smaller substitution rate for the female exon, suggesting a stronger selective restriction on the female exon. Because the contrasts of models B vs D and C vs E were not significant, there was little evidence that the difference in rates between common region and female exon were due to differences in k and/or v rates. Similarly, the female exon showed a 3.4 times reduction in substitution rate in relation to the male exon, but with little evidence for significant differences in k and/or v rates.

Positive selection detection by changes in physicochemical properties
When we considered the effects of selection on physicochemical changes in the male and female isoforms, 27 and 29 out of 51 properties, for female and male respectively, deviated significantly from neutrality, based on the global goodness-of-fit statistics calculated by the MM01 method in TreeSAAP (Table S3). The significant properties were examined by a more specific analysis which separated nonsynonymous changes into three categories and tested if each category significantly deviates from neutrality. We found three amino acid properties to be under positive-destabilizing selection (category 3) in the male isoform; among them, two amino acid properties were related to changes in alpha-helix structure and one was related to changes in beta-sheet structure (Table 3). However, only the Normalized Frequency of Alpha-helix property was under positive-destabilizing selection in the female isoform (Table 3). We found 16 and 18 amino acid properties under positivestabilizing selection (purifying selection) in female and male isoforms. In the male isoform, six properties were involved in hydrophobic interactions, six in charge/polarity changes, two in beta-sheet structure and four in turn/bend structural changes. In the female isoform, six physicochemical properties were related to hydrophobic interactions, seven to charge/polarity changes, and three to bend/beta-sheet structures (Table 3).
When the male and female isoforms were screened by the sliding windows approach, the majority of codons under positivedestabilizing selection were found in the common exons, particularly in a region near the 39-end of DM/OD1 and in the male-specific OD2 domains (Figure 4). Among the properties previously detected to be under positive-destabilizing selection, only those related to modifications in alpha-helices were significant in the sliding windows approach. Likewise, only in the male isoform could we identify positive-destabilizing selection in the OD2 domain, specifically inside regions predicted to form alphahelices of UBA-like domain responsible for the dimerization of DSX [63,64].

Population evolutionary analysis of doublesex
The 540 bp amplified region of doublesex ( Figure 3B) was cloned and sequenced from 51 individuals across Brazil ( Figure 1) and combined with other sequences available from GenBank [25] for analysis. There were 98 different haplotypes, the great majority of which were singletons. Only six haplotypes were found in more than one copy, three of them shared among different species, including the most common, haplotype 1, which was found in A. fraterculus, A. obliqua and A. sororcula (Table S4). The different recombination tests failed to detect recombination in these sequences. The haplotype network inferred in TCS revealed that most variation did not define species-specific lineages ( Figure 5). Because we failed to detect reciprocal monophyly, many of the parameters were also estimated for the group of species as a whole, considering the clade as the evolutionary unity of interest. For the same reason, MKA tests were not performed because it requires at least some fixed interspecific differences to compare intra and interspecific synonymous to nonsynonymous rate ratios.
Estimates of nucleotide and haplotype diversities, and nonsynonymous and synonymous substitutions were similar regardless of whether A. fraterculus, A. obliqua and A. sororcula were considered separately or as a single group along with other species of the fraterculus group (Table 4). We also failed to observe significant differences in Hd, p and h between the DM/OD1 domain and the remaining sites of the common region.
Tajima's D and Fu and Li's D were significant only in Anastrepha sp. and A. obliqua (Table 5) in the direction of purifying selection.
The same signal of purifying selection was found using Templeton's contingency test, which showed a significant excess of nonsynonymous substitutions in tips and synonymous substitutions in interior branches (Tables 6 and 7). Even when DM/OD1 was removed from the common region and both subsets were independently analyzed, we could still reject the null hypothesis of equality in proportions of synonymous and nonsynonymous mutations in tip and interior branches (Table 8).
We failed to find any significant differences in the number of haplotypes derived from synonymous and nonsynonymous mutations if the full fragment was considered or if only the DM/OD1 subset was analyzed using Mann-Whitney's U test (data not shown). However, an analysis of the variable common region detected a significant difference in survival through time between synonymous and nonsynonymous mutations as measured by the number of derived haplotypes ( Figure 5, Table S5). Despite the smaller number of interior nonsynonymous mutations, they gave rise to a significantly greater number of derived haplotypes (median = 22) than synonymous mutations (median = 13), (Z c corrected for continuity = 1.933; p = 0.052).

Positively selected sites in dsx
Considering that dsx is critical to the differentiation of insects into males and females, mutations in dsx that cause deleterious effects might seriously impair reproduction. Mutations should have stronger effects if they occur in functionally conserved regions, as it has been shown for mutations in either DM/OD1 domain or OD2 domain that produced intersex phenotypes [65]. So, we might expect a strong selective constraint in dsx, with a consequent reduction in nonsynonymous variation.
In agreement with this expectation, previous studies have shown that dsx has been evolutionarily conserved, and concluded that the whole gene has evolved under purifying selection and that positive selection had played a minor role in the evolution of dsx in insects [15,25,27,66]. However, these studies based their conclusions either on the percentage of identity among dsx sequences of different species [15,25,27] or on the overall effect of selection on the gene [25,66] and they indicated the female exon to be the most conserved exon, followed by the male and common exons [15,25,27].
Likewise, when we compared evolutionary rates of female, male and common exons with each other, using the fixed-sites tests, we observed a stronger selective constraint on the female exon. Nonetheless, we failed to observe significant differences in Table 3. Amino acid physicochemical properties under positive destabilizing and purifying selection in male and female isoforms of doublesex. evolutionary rates between the male exon and the common region.
The low values of v point estimates for the three main segments of dsx suggested that, overall, they have evolved selectively constrained. Analyses that use global estimates of evolutionary rates in coding regions, though, have low power to detect signals of positively selected changes, and any gene segment under strong purifying selection could dramatically reduce the global v estimates, even if there were sites outside the selectively constrained region under positive selection [26,38]. Therefore, when we employed analyses which considered variation in v rates in branches and sites, not only average estimates for v, we found that dsx, although still under some selective constraints, showed several positively selected sites in the common region. The signal of positive selection was particularly related to structural changes in alpha-helices, which play important role in the dimerization and interaction of DSX with other transcription factors [14,63,67]. These tests contrast somewhat distant evolutionary lineages, therefore, their power may be limited by homoplasy, which is revealed particularly if there is saturation of synonymous substitutions when compared to evolutionary distances. If we had significant saturation in our estimates, we would have to consider the results of the analyses based on the dN/dS ratio with caution. Since the lineages here investigated are in the range of estimates without evidence of underestimation of dS, the evidence of positive selection by elevated v rates here detected should hold.

Selection and genetic variation at the population level
We wanted to investigate if the positive selection detected by comparing well diverged lineages would affect variation in dsx at the population level and whether we could also detect any evidence of positive selection at this level. Different species in the A. fraterculus group revealed high levels of haplotype diversity (Hd) and an excess of unique and rare haplotypes, with only a handful of haplotypes that were found more than once. Our sampling scheme, which favored few individuals from different and distant localities, may partially explain the excess of rare haplotypes observed, but it does not account for the high proportion of heterozygotes observed. Therefore, it is possible that the level of polymorphism observed is representative of the whole population. In spite of this sampling scheme, the polymorphism levels detected are not associated with fixed species-specific differences, a result consistent with other studies in the fraterculus species group using mitochondrial COI and some nuclear genes [12,22,25,68].
The Templeton's contingency test indicated that the majority of nonsynonymous substitutions were present in tip haplotypes, which generally represent newly arisen mutations [48], rather than in interior haplotypes, which are expected to be older under neutrality [69]. This result along with the significantly negative Tajima's D, and Fu and Li's D and F neutrality tests in Anastrepha sp. and A. obliqua is compatible with an overall signature of purifying selection. Such results concur with what was observed with the haplotype diversity and indicate an excess of rare variants that fit the expectations of background selection [70]. Though these neutrality tests are generally used to detect the overall effects of selection on a DNA region, similar results might also be obtained by some demographic factors, such as Wahlund effect or population size expansion, even in the absence of selection [71]. In spite of the evidence of purifying selection in the sequenced fragment of dsx from some neutrality tests, we observed several nonsynonymous substitutions that were very frequent in the populations, including some that were involved with radical changes in amino acid properties. The Mann-Whitney U test showed that even though internal nonsynonymous mutations are rarer, their descendents are present in higher number of copies in the population than the descendents of internal synonymous mutations. This pattern deviates significantly from the neutral expectation, and suggests the action of positive selection on the common region of dsx also at the population level, again in a framework of overall purifying selection.

Selection patterns in dsx
While little is known about the mechanism by which positive selection could have occurred on dsx (but see [66]), the pattern of variation observed for the common region of dsx might be explained by the dual nature of DSX. On one hand, DSX is a central part of the conserved sex-determination cascade and also interacts with a variety of genes including those involved in homeotic pathways [72][73][74]. On the other, DSX is involved in several other reproductive functions including male courtship behavior, along with FRU [23], and genital development [72,75], which may be subject to sexual and positive selection. Consequently, changes in DSX may be selectively disadvantageous for one function, but not for another. It is even possible that because of such complex interactions, mutations at dsx could behave as slightly deleterious or advantageous mutations depending on the genetic background they are interacting with, a process that resemble sign epistasis [76].
Another mechanism that could favor positive selection might be the interaction among different segments of DSX. DSX is functional as a homodimer, and the dimerization is carried out by DM/OD1 and OD2 domains [14,77]. Both domains are thermodynamically linked, since the binding of DM to DNA is favored by the strengthening of the dimerization by OD2 domain [78]. Thus, important changes in the structure and/or chemical interaction in one domain might induce changes in the other, and might explain the positive-destabilizing selection for alpha-helices detected near the C-terminal of OD1 and inside the OD2.
Such pattern of selective constraint in important domains and diversification in other parts of the gene seems to be common for other sex-determining genes, such as transformer (tra) and fruitless (fru). The gene tra has been shown to be relatively unconstrained, with a v rate ratio moderately higher (v = 0.2 to 0.3) than that found for other genes not related to sex differentiation [10,11]. These data indicate, nonetheless, that tra in Drosophila is still under purifying selection, although mild in great extension of the gene [10,11]. On the other hand, tra has been considered to be under strong purifying selection in some species of the fraterculus group, although some regions rich in serine-arginine show high levels of polymorphism [79] and the authors failed to perform some specific tests for positive selection, such as the ones performed in PAML. A similar pattern of conserved and variable regions has been reported for another gene from the sex-determining cascade, fruitless (fru), which has been shown to be conserved at the BTB dimerization and DNA-binding domains, but highly divergent at the N-terminal extension and connecting region [80], where we detected signals of positive selection among species of Anastrepha [12].

Conclusion
Our study shows that although dsx is relatively conserved, it has an unusual number of nonsynonymous changes in some domains that were shown to be maintained by positive selection. This might indicate that purifying selection maintains gene function directly related to fitness, but may be not be strong enough to counteract positive selection in other domains of the gene that may be under sexual selection. Therefore, part of the variation found in dsx may be explained by adaptive changes promoted by positive selection in regions outside the DNA-binding domain. Furthermore, relaxed selective constraint mediated by pleiotropic effects of this gene across sexes and/or epistatic interaction between dsx and the downstream regulated genes could explain the segregation of nonsynonymous substitutions in the rest of the common region of dsx and in the male-specific exon.

Supporting Information
Material S1 Amino acid alignment of doublesex female isoform used in the divergence-based methods.