Divergent Evolution among Teleost V1r Receptor Genes

The survival of vertebrate species is dependent on the ability of individuals to adequately interact with each other, a function often mediated by the olfactory system. Diverse olfactory receptor repertoires are used by this system to recognize chemicals. Among these receptors, the V1rs, encoded by a very large gene family in most mammals, are able to detect pheromones. Teleosts, which also express V1r receptors, possess a very limited V1r repertoire. Here, taking advantage of the possibility to unequivocally identify V1r orthologs in teleosts, we analyzed the olfactory expression and evolutionary constraints of a pair of clustered fish V1r receptor genes, V1r1 and V1r2. Orthologs of the two genes were found in zebrafish, medaka, and threespine stickleback, but a single representative was observed in tetraodontidae species. Analysis of V1r1 and V1r2 sequences from 12 different euteleost species indicate different evolutionary rates between the two paralogous genes, leading to a highly conserved V1r2 gene and a V1r1 gene under more relaxed selective constraint. Moreover, positively-selected sites were detected in specific branches of the V1r1 clade. Our results suggest a conserved agonist specificity of the V1R2 receptor between euteleost species, its loss in the tetraodontidae lineage, and the acquisition of different chemosensory characteristics for the V1R1 receptor.


INTRODUCTION
In vertebrates, interindividual interactions related to reproduction are largely dependent on pheromonal communication. Chemosensory structures located in mammals in the nasal cavity and in teleosts in the olfactory rosette represent the major tools used for such exchanges. These structures contain thousands of olfactory sensory neurons, organized in pseudostratified neuroepithelia. Each olfactory sensory neuron extends a single dendrite towards the outside world and an axon which directly connects to the olfactory bulb, in the brain. A single or a few members of a remarkably large olfactory receptor gene repertoire are expressed by each olfactory sensory neuron. Thus, chemical information from the outside world is transported by multiple parallel lines to the olfactory bulb, where it is processed, directed towards various brain areas, and translated into a conscious or unconscious perception.
Four different classes of olfactory receptors have been described in teleosts: odorant, trace amine-associated, vomeronasal type 2 (V2r) and vomeronasal type 1 (V1r) receptors. Odorant receptors number 143 in zebrafish [1], and represent, as in mammals, one of the largest gene families in the genome. Trace amine-associated receptors [2], are also present in fish species (57 genes in zebrafish) [3]. V2rs react to amino acids in teleosts, and are relatively numerous (at least 24 potentially functional V2rs in zebrafish) [4][5][6][7]. V1rs mediate pheromone detection in mice [8,9] and are thought to play a similar role in other species. In mammals, V1r genes are highly divergent, often species-specific, and in rodents number over 100 [10][11][12][13][14]. The role played by the size and variability of these V1r repertoires is not understood yet. Thus, basic questions, such as the degree of potential redundancy between paralogous V1r genes, remain unanswered. However, positive Darwinian pressure has been shown to act on some of these receptor genes, between paralogous or orthologous sequences [12,[15][16][17][18][19], indicating, at least that for some novel genes, novel functions were acquired.
The variability and profusion of V1r genes in some species result from a high rate of gene birth and death [12,13,19]. This unusual characteristic renders the identification of orthologous V1r pairs, if existent at the functional level, difficult to perform with certainty, even in species as closely related as mice and rats. Despite the extensive genomic duplications which affected teleost genomes [20][21][22][23][24], fish V1r genes appear not to have gone through the flourishing diversification and amplification characterizing mammalian pheromone receptor genes. This peculiar situation allows the study of orthologous V1rs between euteleost species (Table 1) whose radiation dates back over 110 mio. years ( Figure 1) [25][26][27][28]. We here report such an analysis.
We identified two different V1r sequences fulfilling these criteria in Dr, and found their corresponding orthologs in Ga and Ol, one of them corresponding to a previously described zebrafish V1r [16]. A single V1r gene was found in Tn and Tr. We named the V1r common to all tested species V1r1 and the one missing in the tetraodontidae lineage V1r2. We hypothesized the existence of a potential pseudogenized or distant V1r2 sequence in tetraodontidae species, and tested this hypothesis by searching for sequences with a potential to encode 10 amino acid triplets conserved in Ol, Dr, and Ga V1r2: HLA (49), LTR (59), PQT (64), GCK (80), NMA (140), APR (153), GFC (167), TRA (228), FGI (245) and PVV (263), in sequences surrounding tetraodontidae V1r1 genes (100kb upstream and downstream of tetraodon V1r1, and 11kb upstream and 10kb downstream of fugu V1r1). We estimated that for a sequence to be considered potentially related  to a V1r2 gene, at least 4 of the 10 conserved triplets were to be present, irrespective of the reading frame or the transcription orientation, and with a distance between each triplet not exceeding+/212 amino acids relative to the one in the known V1r2 sequences. No sequence fulfilling these criteria was identified, suggesting the absence of a V1r2 gene in the vicinity of the V1r1 tetraodontidae gene (but without excluding the potential presence of a highly degenerate V1r2 pseudogene). We also performed PCRs at low stringency and/or used degenerate primers (see Materials and Methods) on genomic Tr and Tn DNA, using V1r2 primers located at positions conserved in all other identified fish V1r2 genes. No amplicon corresponding to a potential V1r2 gene or pseudogene was obtained, again supporting the potential absence of this gene in Tetraodontidae. However, our own experience makes us wary of such negative observations [16].
Mammalian V1Rs exhibit 10 highly conserved amino acid residues (some of which are exclusively observed in V1Rs), and a conserved glycosylation site in extracellular loop 2 [14]. 7 of the conserved residues and the glycosylation site were present in all teleost V1r sequences ( Figure 2).
To dispose of a larger and therefore more reliable dataset, we then expanded our analysis to species pertaining to a single genus, Danio. Using PCR primers specific for Dr V1r1 and V1r2 sequences, we investigated the presence of the two genes in the following species: D. albolineatus (Da), D. choprae (Dc), D. dangila (Dd), D. kerri (Dk), D. kyathit (Dky), D. malabaricus (Dm), D. nigrofasciatus (Dn), D. pantheri (Dp) and D. yoma (Dy) ( Table 1).V1r1 and V1r2 sequences were found in all tested Danio species.
Analysis of the genomic location of the two V1r genes indicated that they were clustered on chromosomes 22 in zebrafish, 5 in medaka, and 17 in stickleback. Their transcription directions were inverted in all cases observed, in a bidirectional orientation, with an unusual short distance between coding sequences [14], ranging from 1.1kb (Dm) to 3.4 kb (Ol) (Figure 3).

A Conserved Sequence in the 59UTR of V1r1
V1r receptor genes in mice exhibit a characteristic not observed in odorant receptor genes: an unusual conservation of noncoding sequences between members of a given family, which includes transcribed and non-transcribed regions [18]. We evaluated the potential homology of non-coding sequences between orthologous teleost V1r1 genes (Dr, Ol, Ga, Tr, Dm, and Tn) by pairwise comparisons of the sequences using the Pipmaker analysis tool. We identified a 75 nucleotide segment located 59 to the V1r1 coding sequence containing 31 invariant nucleotides ( Figure 4). The position of this conserved sequence relative to the start site ranged from -81 (Dr) to -185 (Ol) ( Figure 3). 59 RACE was performed and one transription start site was identified 404 bp upstream of the V1r1 first ATG codon ( Figure 3), indicating that the conserved sequence was part of the V1r1 59UTR.
The conserved segment was not found duplicated 59 toV1r2 sequence, and was not identified associated with any V1r mouse gene. No significant conservation of other non-coding segments was identified neighboring V1r2 sequences.

Expression of V1r1 and V1r2 in the Zebrafish Olfactory Rosette
We investigated the potential expression of V1r1 and V1r2 in the olfactory rosette using two approaches. We first performed RT-PCRs on olfactory rosette extracts from male and female adult zebrafish using primers specific for V1r1 or V1r2. V1r1 and V1r2 transcripts were consistently observed in this tissue ( Figure 5a). No transcripts were found in other organs, including barbels, lips, heart, gills, muscle and brain (data not shown). Second, and in order to obtain a more precise picture of the cells expressing the V1r receptor genes, we performed in situ hybridizations on olfactory rosette sections. Single neurons located in the sensory neuroepithelium did express V1r1 transcripts, similarly to what was observed for cells expressing V1r2 (Figure 5b-d). No evidence of sexual dimorphism in the expression of the V1r receptor genes was observed (data not shown).

Unequal Evolutionary Rates Between V1r1 and V1r2
Two phylogenetic trees, including 24 V1r1 and V1r2 teleost genes or their corresponding proteins, were generated using Maximum Likelihood methods ( Figure 6). We evaluated potential evolutionary differences between the V1r1 and V1r2 clades by performing a Relative Rate Test, and found that the V1r1 clade evolved significantly faster than the V1r2 clade (p = 0.015) (see Materials and Methods).
To determine the origin of the acceleration detected in the V1r1 clade, we performed a series of tests of increasing sensitivity, looking at the whole dataset or at specific clades, at full-length sequences or codon by codon, and aimed at measuring the selective pressures acting on these genes. Two potential explanations for the observed differential evolutionary rates, which are not mutually exclusive, involve relaxation of the purifying selection, and positive Darwinian selection.
The dN/dS ratio (v) for the overall dataset was first calculated using the one-ratio model (M0) as implemented in the PAML software (see Materials and Methods). An v of 0.099 was estimated, suggesting that a strong purifying selection is globally acting on V1r receptors. Using the two-ratio model, the v estimates for the V1r1 and V1r2 clades were 0.147 and 0.050 respectively ( Figure 6). The Likelihood Ratio Test (LRT) indicated that the two-ratio model fit the dataset significantly better than the one-ratio model ( Table 2), confirming that the V1r1 clade has a 36times higher rate of non-synonymous mutations. However, no evidence for positive selection (v.1) was found.
To further investigate potential changes in selective pressure on specific residues, we applied site-specific models to our dataset (see Materials and Methods). The comparison of different site-specific models suggested heterogeneous evolutionary rates within V1r sequences. Thus, the M3 model (discrete with K = 2) fit the data significantly better than the M0 model (one ratio). The same M3 model allowing 3 different v, fit the data better than with 2 different v ( Table 2). This last model proposes 29.3% of highly conserved sites (v0 = 0.016), 60.2% of sites under strong purifying selection (v1 = 0.103), and 10.6% of the sites under moderate purifying selection (v2 = 0.455). Furthermore, the M8 model, which allows positive selection and which fit the data better than the M7 model ( Table 2), indicated that 3% of the amino acid sites were under positive selection. The identities of the positively selected residues were however not statistically supported (BEB calculations, see Materials and Methods).
We then analyzed our dataset with a site-specific approach, but looking at specific branches. We used the branch-site model D (see Materials and Methods) to identify potential selective pressure differences for a category of sites between the two paralogous clades. The LRTs involving model D (with K = 2 or K = 3 categories of sites) and its nested model M3 indicated that model D with K = 3 fit our data the best (Table 2). This model proposed that for both genes, 45% of sites were under marked purifying selection (v0 = 0.13), 9% under moderate purifying selection (v1 = 0.48), and that 46% of the sites displayed a 106relaxation in the V1r1 clade when compared to the V1r2 clade (v2V1r2 = 0.005 vs. v2background = 0.048). However, this large Thus, models with additional categories of v, applied either to the whole dataset or to specific groups within the tree, were systematically preferred to models of lower complexities ( Table 2).
The potential existence of rare positively selected sites in a subset of branches was then assessed. Using again a branch-site model, model A (see Materials and Methods), we analyzed the 21 longest internal and terminal branches of the V1r1/V1r2 clades. We found 5 branches exhibiting positively selected sites in the V1r1 clade and a single branch in the V1r2 clade. Among these branches, we were able to identify 9 significant positively selected sites, 5 of which were located in transmembrane regions ( Figure 2). All significant sites pertained to the 5 branches of the V1r1 clade (Table 3, Figure 6).
In accordance with our previous estimates (results of model M8), the percentage of these residues was limited, ranging from 0.4% to 7.0% (Table 3). 1 to 3 sites per branch displayed an v significantly above 1 (BEB calculations) ( Table 3).
Our approach, involving the use of models of increasing finesse, thus led to the confirmation of differential evolutionary rates between paralogous V1r clades, and to the detection of positive selection between orthologous V1r genes.

DISCUSSION
We here describe two linked teleost V1r genes, V1r1 and V1r2, and identify their orthologs in multiple fish species, taking advantage of their physical proximity and relative conservation.
Apparently, not all teleosts share the same V1r repertoire; we were indeed unable to identify a V1r2 receptor gene in the two tested tetraodontidae species. This negative observation may naturally reflect a failure to identify an existing sequence, due to a still incomplete coverage of the tetraodon and fugu genomes. This seems however unlikely since these genomes are close to being fully sequenced. Moreover, since V1r1 and V1r2 genes are found in close association in all teleost species tested (less than 3.5 kb apart), and since relatively large genomic sequences surrounding the tetraodon and fugu V1r1 were available and searched, we propose that our findings reflect a gene loss, which specifically affected the tetraodontidae lineage. This potential genus-specific gene loss is consistent with the remarkably compact genome of members of this lineage, genome that can be as small as 390 Mb in fugu and 340 Mb in tetraodon (while zebrafish and medaka genome sizes are 1.7 Gb and 800 Mb respectively). This limited genomic material translates into a contraction of gene number, shrinking known to affect many gene families, including olfactory receptor gene repertoires. Thus, a comparison between fugu and zebrafish odorant receptor genes indicates a drastic reduction in the fugu repertoire size (143 vs. 44 and 42 odorant receptor genes in zebrafish, fugu and tetraodon respectively) [1].
We identified a short sequence in the 59UTR of V1r1 transcripts, conserved between multiple teleost species, including some whose common ancestor dates back over 100 mio. years. This observation is reminiscent of the remarkable 59UTR Figure 6. Teleost V1r1 and V1r2 exhibit different evolutionary rates. (a) Phylogenetic analysis of teleost V1r genes. A rooted DNA tree was generated, based on an alignment of 24 teleost V1r sequences and three mouse V1rs (V1rf3, V1re4 and V1rb2). Similar v were obtained for each V1r family using two-or three-ratio models. The V1r1 clade v value was 36times higher than the one of the V1r2 clade (see Table 2). Positive selection was detected on branches A-F and asterisks indicate branches in which positively selected sites were identified. (b) Unrooted tree based on the amino acid alignment of the 24 teleost V1r receptors. doi:10.1371/journal.pone.0000379.g006 Fish V1r Divergence PLoS ONE | www.plosone.org conservation present in mouse V1ra and V1rb transcripts [18]. It may point to a transcriptional or translational regulatory element acting on fish V1r1 genes, and possibly also on V1r2. We also report a peculiar genomic organization of the fish V1r1 and V1r2 genes. As little as 1.1 kb separates the coding sequences of the two genes, leaving limited space for two 59UTRs and two promoters since their transcriptional direction is inverted. The zebrafish V1r intergenic sequence, whose very limited size renders it easily amenable to genetic manipulation, will surely prove to be of interest in the study of the mechanisms regulating V1r expression,    which at present remain completely unknown. We observed that unequal selective pressures affected the evolution of the V1r1 and V1r2 clades. The V1r1 clade exhibits a relatively relaxed negative pressure, with on given branches, specific residues under positive Darwinian selection, a situation observed for some mammalian V1rs. This finding suggests a possible variable role played by orthologous V1R1 receptors, or at least non-homogenous agonist response profiles between the latter (due for example to speciesspecific coevolution of agonist-receptor pairs). The situation is clearly different for the V1r2 clade: a strong purifying pressure is at work on V1R2 receptors, suggesting a common role and/or agonist profile between orthologous V1R2 proteins. Such theories will be put to test as soon as we identify natural agonists for fish and mammalian V1rs. But the road is apparently long before we dispose of a general view of agonist-V1r receptor pairs in vertebrates. Indeed, despite an identification of the first V1r genes over a decade ago [29], only a single such pair has been identified [8]. This situation, partly resulting from the difficulties faced to express these receptors in vitro, also reflects a more problematic situation, which is the very limited number of known potential agonists.

Identification of V1r genes
The

RT-PCR
mRNA was extracted (RNeasy extraction kit, Qiagen) from 40 male and female adult olfactory rosettes. cDNA was synthesized using a random hexamer primer. Primers used for Dr V1r2, b-actin and OMP genes were previously described [16]. Primers for Dr V1r1 were the following: TGT TTC TGT CCC TGG TGC TGG T and AGC CGG AGG TCA GTG ATC AG. PCR conditions were previously described [16].

Phylogenetic trees
Nucleotide sequences of 24 teleost V1r sequences and mouse V1rb2, V1re4 and V1rf3 were aligned using ClustalX [32] and manually rearranged using the Bioedit alignment software (v. 7.0.5.3). The MODELTEST v3.7 program [33] was used to determine the model of DNA sequence evolution that fit our data best. The best fit model was the General Time Reversible (GTR) with a gamma shape distribution of evolutionary rates (a = 1.18; 8 categories of sites). Phylogenetic tree reconstruction based on DNA sequences was performed using the Maximum Likelihood (ML) method as implemented in the program Phyml [34] and using NNI and SPR branch swapping methods, or imposing different starting topologies to avoid local optima. The amino acid tree reconstruction was performed using the JTT model of amino acid changes and a gamma shape distribution of evolutionary rates (a = 2.15; 4 categories of sites). The same topology was retrieved when using two different starting trees: the Bionj Tree or the best topology found using the DNA alignment. For DNA and amino acid analyses, support for branches was assessed by bootstrap analyses of 500 replicates.

Evolutionary rates and selective pressures
Differences in the evolutionary rates of the two paralogous V1r copies were statistically tested using the RRtree program [35]. Selective pressures acting on the V1r receptor genes were assessed using different models as implemented in the PAML software v.3.15 [36]. Couples of nested models were compared using Likelihood Ratio Tests (LRTs) to statistically determine the best model. Twice the difference of the likelihood values of the tree obtained under each model approximately follows a chi-square distribution and, together with the number of degrees of freedom (df) separating the two models, allows the calculation of the associated p-value.
To evaluate a possible unequal ratio of non-synonymous vs. synonymous substitutions (v) in the two paralogous V1r copies, we performed an LRT comparing the two-ratio model (one v per clade) with the one-ratio model (M0) where v is constant. Varying selective pressure along the genes was assessed by a LRT test comparing the site-specific model M3 (discrete) with 2 or 3 classes of sites displaying different v, against the M0 model (one-ratio).
The branch-site model D [37], an extension of the model M3 which allows selective pressure at one class of sites to be different in two parts of the phylogeny, was used to test for divergent selective pressure between the two paralogous V1r genes, by applying an LRT on models D vs. M3.

Positive selection
Positive selection was first assessed using two LRTs involving sitespecific models of increasing complexity. Model M2a (which allows one group of sites to have an v.1), was compared to model M1a (where v cannot exceed 1). Discrete model M8 (in which the v value can vary according to a beta distribution (between 0 and 1) and which includes an extra category of sites with v.1), was compared to model M7 (in which v can only vary according to the beta distribution). Branch-site models were used to identify positive selection acting on sites along specific branches of the phylogenetic tree. Model A [38] was used to identify positive selection acting on sites along specific branches of the tree. This model, which allows one category of sites to evolve under positive selection (v2.1), was compared by LRT to that same model with an v2 fixed at 1 (Test 2). This LRT test is the most reliable test according to Zhang et al. [38]. Significance of positively selected sites was evaluated using Bayes Empirical Bayes (BEB) calculations [39].

Animals
Animals were housed and handled accordingly to the guidelines and regulations of the institution and of the state of Geneva.