The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire

For rodents, olfaction is essential for locating food, recognizing mates and competitors, avoiding predators, and navigating their environment. It is thought that rodents may have expanded olfactory receptor repertoires in order to specialize in olfactory behavior. Despite being the largest clade of mammals and depending on olfaction relatively little work has documented olfactory repertoires outside of conventional laboratory species. Here we report the olfactory receptor repertoire of the African giant pouched rat (Cricetomys ansorgei), a Muroid rodent distantly related to mice and rats. The African giant pouched rat is notable for its large cortex and olfactory bulbs relative to its body size compared to other sympatric rodents, which suggests anatomical elaboration of olfactory capabilities. We hypothesized that in addition to anatomical elaboration for olfaction, these pouched rats might also have an expanded olfactory receptor repertoire to enable their olfactory behavior. We examined the composition of the olfactory receptor repertoire to better understand how their sensory capabilities have evolved. We identified 1145 intact olfactory genes, and 260 additional pseudogenes within 301 subfamilies from the African giant pouched rat genome. This repertoire is similar to mice and rats in terms of size, pseudogene percentage and number of subfamilies. Analyses of olfactory receptor gene trees revealed that the pouched rat has 6 expansions in different subfamilies compared to mice, rats and squirrels. We identified 81 orthologous genes conserved among 4 rodent species and an additional 147 conserved genes within the Muroid rodents. The orthologous genes shared within Muroidea suggests that there may be a conserved Muroid-specific olfactory receptor repertoire. We also note that the description of this repertoire can serve as a complement to other studies of rodent olfaction, as the pouched rat is an outgroup within Muroidea. Thus, our data suggest that African giant pouched rats are capable of both natural and trained olfactory behaviors with a typical Muriod olfactory receptor repertoire.


Introduction
In rodents, olfaction is essential for a number of behaviors including social recognition [1], sexual behavior [2], predator detection [3], and finding food [4]. The mechanism for olfactory perception in rodents uses two systems, the main olfactory system and the vomeronasal of this genus and other members of the subfamily Cricetomyinae (i.e. Beamys and Saccostomus) [34] is the gerbil-like cheek pouches where Cricetomys store food during foraging. However, their physical resemblance to Rattus is most likely a result of convergent evolution from distinct ancestors. Cricetomys species have notably large olfactory bulbs and neocortex for their size-even when compared to other rodents [30,35]. The olfactory bulbs of the pouched rat comprise 19% of the total brain length, while the greater cane rat (Thryonomys swinderianus) have olfactory bulbs of only 9% of the brain, despite occupying similar regions in Africa [35,36]. Furthermore, the neocortex for the pouched rat accounts for 75% of the cerebral cortex (compared to 50% in mice) [36], and the neocortex ratio places the pouched rat in the range of primates [30]. The anatomical features of the olfactory system can predict the size of the OR repertoire [37,38], thus, we predicted that the pouched rat may also have an expanded OR repertoire compared to its extant relatives. In addition to these anatomical features, recent behavioral research emphasizes the pouched rats' olfactory ability. Pouched rat males can use olfactory cues to discriminate between reproductively available and unavailable females, and similarly, females exhibit preferences for 'competitive' males based on olfactory cues [40]. Furthermore, males can perceive olfactory cues from other males' scent marks which they countermark in unfamiliar contexts [41]. Pouched rats are even used as 'biodetectors' and can be trained to detect tuberculosis and TNT via olfaction [42,43]. Given their anatomical adaptations for olfaction, their use of olfactory cues in potential competitor and mate assessment, and their use as olfactory biodetectors, we assessed the olfactory repertoire of C. ansorgei to determine how the pouched rat OR

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire repertoire compares to other rodents. We hypothesized that in addition to anatomical elaboration of the olfactory system, pouched rats would have an expanded OR repertoire which enables their olfactory behavior.
Most rodents that have been previously studied for their OR repertoire, or used for comparative work, are limited to lab models (i.e. Guinea pig, rat (Rattus norvegicus), mouse) [24,28], though recently some others have been described [11]. Thus, our aim was to describe the olfactory receptor repertoire of C. ansorgei as a representative of Cricetomys, which would expand the diversity within described Rodentia olfactory receptor gene repertoires. Here we describe ORs in this species and compare their distribution and number to other rodents (Fig 1). Furthermore, we examine the evolution of ORs in rodents by considering orthologous gene groups, and the similarity of these ORs across species.

Methods
We downloaded the publicly available draft assembly of the pouched rat genome (NCBI ID 75238 (CriGam_v1_BIUU).

Detection of OR genes in the pouched rat genome
We created a database of ORs from human (H. sapiens), mouse (M. musculus), rat (R. norvegicus), and squirrel (Ictidomys tridecemlineatus) from Ensembl BioMart [44], using the search term 'olfactory receptor' and the most recent genome versions. We then performed a translated basic local alignment search (TBLASTN) to identify potential ORs within the pouched rat genome. We included all matches, which had e-values of 0, and/or the match started at 1, protein alignment lengths of > 133 amino acids, bit scores of > 224, and gaps of 40 or fewer amino acids. These parameters set a threshold; below these parameters matches declined quickly in similarity. For mouse matches, this reduced the number of total results (1,148,779) to 1213, and for rat matches reduced the number of total results (1,219,637) to 1219. For human and squirrel matches, all the matches were above the threshold and included 35 and 903 sequences, respectively. We used the Galaxy web platform to conduct a multiple alignment using fast fourier transform (MAFFT) to locate and remove identical duplicate sequences and combine overlapping sequences [45]. This resulted in 2,399 sequences which coded for potential pouched rat ORs. We then extended the ends of sequences by 100 bp to ensure that we would obtain the full coding region, and excluded any genes of less than 750 bp, because most functional OR genes are >250 amino acids [46], which resulted in 1409 sequences. We obtained all the open reading frames (via EMBOSS) [47] within all the sequences (~23k; 84% were shorter than 50 amino acids in length) and filtered out those shorter than 250 amino acids [46]. This left 1,149 intact coding genes. Using these putative functional ORs, we generated a multiple sequence alignment (via MAFFT) and examined any outliers using a phylogenetic tree created with the FastTree program [48]. We used BLAST on any outliers (long branches separate from other groups) in the tree, to verify they encoded olfactory receptors, which led to the removal of 4 sequences which were most similar in terms of sequence similarity to other known G-coupled receptors. Thus, out of the 1405 potentially functional OR receptors, 18.5% were labeled 'pseudogenes' that likely did not produce a functional receptor product (but see [49]), either due to truncation, deletions or frameshift mutations. Using these 'manual similarity-pipeline' search and selection methods, we found 1145 putative functional (i.e. intact) OR receptor genes.
In addition to the above pipeline, we also used the Olfactory Receptor Assigner (ORA) module to detect potential OR genes and pseudogenes in the entire pouched rat genome [11]. This method identified a similar number of intact OR genes, and detailed methods are available in the supplement (S1 File). The ORA module detected 1060 unique putative functional OR receptor genes from the pouched rat genome.

Classification based on phylogeny
We produced a phylogenetic tree using the 1145 intact pouched rat OR sequences located during the manual pipeline search using MAFFT (L-INS-I method) for sequence alignment and RAxML to produce an unrooted tree with 1000 bootstraps using the PROTCATJTTF option. We then used the method described by Zhang and Firestein (2002), to determine family categorization, by > 50% bootstrap support and 40% protein identity. Subfamilies were defined as >50% bootstrap support and 60% protein identity [50]. Protein identity was determined through a BLASTP pairwise-comparison. To determine clades, high bootstrap support is typically used [46], the pouched rat olfactory repertoire separated into two clades with 76 and 71% bootstrap support. The ORA method, described further in S1 File, used a probabilistic algorithm to identify and assign genes to one of 17 family groups [11].

Comparative rodent OR phylogenetic tree
We used a MAFFT alignment (L-INS-I method) and RaxML to produce a tree rooted at the branch between Class I and Class II receptors. We used 500 bootstraps using the PROT-CATJTTF option to compare the 1145 OR protein sequences of pouched rats with mouse (1111 sequences), rat (1310 sequences), and squirrel (733 sequences). Squirrel sequences were obtained from Hughes (2018, Graham Hughes, pers. comm.), while mouse and rat sequences were obtained from Ensembl as above. This comparative rodent OR phylogenetic tree was used for examination of relative expansions in the pouched rat OR repertoire. We quantified orthogroups using the program UPhO, using the MAFFT alignment of sequences, and an unrooted tree produced with FastTree [48,51,52].
Expansions within gene families. We used the rodent OR phylogenetic tree to identify pouched rat-specific expansions. We compared pouched rat intact gene distributions (number of genes in the subfamily) and compared this to the most similar mouse, rat, and squirrel genes. For each noted expansion, we located the most closely related mouse OR gene and identified its subfamily group to determine if there were potential identified ligands within the mouse OR subfamily.
Orthologous ORs. We used the UPhO program to identify orthologous groups of ORs among pouched rats and squirrel, mouse, and rat. Orthologs were defined as homologous one-to-one sequences that were located at the same node in the comparative tree but were sequences from different species [53]. We enumerated instances where two or more paralogs (from any of the rodent species) were found. Paralogs were simply defined as multiple orthologs within an orthogroup [51]. The visual structure of branching for each ortholog group was recorded and compared to shared sequence identity. We calculated the shared sequence identity for each ortholog compared to the pouched rat for each ortholog group.

Clustering within the genome
We analyzed the relative positions of the intact pouched rat ORs obtained from the manual similarity-pipeline, by grouping gene clusters according to positional proximity. The pouched rat genome was parsed into a custom BLAST database (using makeblastdb [54]), which allowed for a conservative estimate of physical clustering-as sequences within the same parsed cluster of the genome would be physically clustered in a chromosome. OR genes were considered 'clustered' when they were located within the same sequence identifier.

Results
We identified 1145 intact pouched rat ORs from the pouched rat genome using our manual pipeline method, which is similarly-sized compared to repertoires of mouse and rat (S2 and S3 Files). The manual pipeline method and ORA method described similar numbers of intact ORs, and details of the ORA method and results are available in the supplement (S1 File). We identified 260 pseudogenes with the manual pipeline method.
The pouched rat has similar numbers of clades, pseudogenes, or truncated genes to mice and rats, and has a relatively similar number of subfamilies to most other species using the manual pipeline search method and the percent similarity method for family assignment [50] (Fig 2, S4 File).
The pouched rat has similar numbers of intact and pseudogenes (Fig 3) compared to other rodents, and similar numbers of subfamilies to mice [25], when the same method of subfamily assignment is used.
To compare pouched rat subfamilies with other species, we produced a gene tree using the sequences identified during manual pipeline search (Fig 4; S5 File), which illustrates that for most mouse, rat, and squirrel ORs subfamilies, there is at least one pouched rat OR (A comparative tree using pouched rat sequences obtained from the ORA was visually similar in structure, and the tree file is provided in the supplement: S6 File). Of particular interest is the number and distribution of lineage specific gene family expansions. We can identify potential expansions where there are relatively more pouched rat ORs and fewer ORs from the other rodents within a family or subfamily. We have indicated these expansions, and their associated family group membership in Fig  4 for reference. The first expansion includes ORs from pouched rat subfamily 329 (S7 File); the largest pouched rat subfamily. It is associated with mouse OR subfamily 150 (Human equivalent Family 1, Subfamily E) [25]. There are 6 ORs in the associated mouse subfamily, but 35 pouched rat ORs [25] ( Table 2). The second expansion includes ORs from the pouched rat subfamily 312. This expansion was associated with mouse subfamily 137. The third and fourth expansions include ORs from pouched rat subfamily 321 and 317, respectively. Pouched rat subfamily 317 is the second largest pouched rat subfamily. The third expansion is associated with mouse subfamily 135 (Human equivalent Family 7 Subfamily E) and the fourth expansion is associated with mouse subfamily 136 (Human equivalent Family 7 Subfamily E). A fifth expansion includes pouched rat subfamily 12, which is related closely to mouse olfr183 and mouse subfamily 89, and also rat olr1555. The last noted expansion is associated with mouse subfamily 71. The distribution of ORs by subfamily size, as defined by [50] is shown in Table 1

Orthologous genes
We found that many pouched rat genes were orthologous to mouse, rat, and squirrel OR genes. 100 of the 1145 intact olfactory receptor genes shared orthology with rat, mouse, and squirrel. 81 of these were one-to-one orthologies among all four species, the other 19 contained paralogs. Using the percent similarity-method for defining families, 10 of the 81 orthologs came from family 38 (ORA equivalent family: OR 51) and 10 from family 101 (ORA equivalent families: OR 1 and OR 7). This further depicts an unequal distribution of orthologs across all of the 87 families defined using this method.
The average similarities of the associated pouched rat gene to these orthologous genes were 92.4% for mouse, 92.0% for rat, and 87.6% for squirrel. However, we also found 31 paralogs in our analyses, where there were multiple pouched rat ORs for one rat, mouse, and/or squirrel OR. For example, Pouchie948 associated with rat olr1356, mouse olfr15, and OR2C1 (squirrel) (Fig 5A). For 1-to-1 orthology of mouse, rat, squirrel, and pouched rat, the most typical branching pattern (68/93) assessed by visual inspection of the graphical tree output included the mouse and rat being most closely aligned, followed by the pouched rat gene, then the squirrel OR gene (Fig 5B).

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire As noted above, in several instances, one or more of the species had a paralogous gene. There were 19 paralog-containing orthologous groups which contained all four species (e.g. one pouched rat gene, two rat genes, one squirrel gene, and one mouse gene). These paralogous groupings were also very similar, with the pouched rat genes sharing an average of 91.2% sequence similarity to mouse, 90.8% to rat, and 82.9% to squirrel. In 4 cases, there were two or three paralogous pouched rat OR genes; which had an average pairwise sequence similarity of 97.1%, suggesting a recent duplication of these genes within the pouched rat lineage. 147 of 1145 pouched rat genes had 1-to-1 orthology with mouse and rat; all three species are in the Muroidea superfamily. Comparatively, there were an additional 34 mouse-pouched rat orthologous genes, 18 rat-pouched rat orthologous genes, and 3 squirrel-pouched rat orthologous genes (Fig 6). There were 9 cases of squirrel-mouse-rat orthologous genes. These latter patterns are consistent with differential loss of genes across rodent lineages.

Clustering of ORs
We examined physical clustering of the ORs based on the locations of the sequences within the gene scaffolds. However, due to the fragmentation of the assembly of the pouched rat genome, these are conservative estimates of OR clustering. We found that there were 173 clusters of ORs comprising 644 of the 1145 ORs. All but 50 of these clusters contained ORs that were clustered completely within a family group. The largest clusters contained 12 and 13 sequences (Table 3).
We examined where these large clusters were located to determine if the expansions were due to potential duplication events in the genome. 20 of these clusters were located either entirely or partially within these large expansions ( Table 4). All of the expansions contained at least one cluster of ORs, supporting the hypothesis that expansions may be partially due to tandem duplications of OR genes.

Discussion
The olfactory receptor repertoire of the pouched rat is similar to rat and mouse olfactory receptor repertoires but has several lineage-specific expansions. We have shown that these expansions are partially attributable to physically clustered OR genes in the genome, suggesting a role of tandemly arrayed gene duplications in generating additional OR diversity, as has been seen other species [55][56][57]. Pouched rats were similar in intact olfactory repertoire size

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire composition to mice, and rats (Fig 3)-which all engage in numerous olfaction-mediated behaviors. Though pouched rats have increased olfactory bulb volume, compared to similar rodents, pouched rats' olfactory capabilities are mediated by a typical rodent OR repertoire with lineage-specific expansions in several subfamilies.
We had hypothesized that the pouched rats' olfactory capabilities were supported by anatomical elaboration and an associated genetic expansion of the OR repertoire. In other species, anatomical features including the olfactory bulb size, have been used to effectively predict OR repertoire size [38,58]. We observed some subfamily and family-specific expansion, however, the overall number of intact ORs did not differ substantially from mice and rats. The potential advantage of their large olfactory bulbs is unclear-whether these bulbs enable enhanced scent detection for foraging [59], or are an adaptation to being nocturnal [60,61] or territorial [62], or enable better sensitivity to odorants in some way is unknown. Pouched rats may be more

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire 'specialized' by having family-specific expansions while maintaining a similar number of functional OR genes compared to mice and rats. Additional investigation of the relationship between olfactory bulb size and OR repertoire in rodents is necessary to determine if other rodents might have enlarged bulbs with typical OR repertoire sizes.

Expansions in the pouched rat OR gene family
We did not find any evidence that the size of the olfactory subgenome or the number of subfamilies in pouched rat differed substantially from other rodents. Though the estimated number of intact pouched rat ORs modestly differed based on our methodology (1145 for a manual BLAST search, and 1060 using the ORA module), both of these estimates are similar to other rodents (Fig 3) and do not indicate substantial deviance from a typical rodent OR repertoire size. The ratio of putatively functional genes to pseudogenes using the manual pipeline search was also similar to other rodents. It is unclear whether pouched rat pseudogenes may have a retained function [49], or have a loss of function. The ORA module located a larger number of pseudogenes than the manual search method. The majority of these pseudogenes were small truncated sequences, which we interpreted as a byproduct of the fragmented assembly. In the manual search, extremely short sequences were filtered out leading to a difference in

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire pseudogene estimates between methods (S1 File). In other research using the ORA module, assembly coverage had a significant effect on the number of estimated pseudogenes, but no significant impact on the estimated number of intact or functional OR genes [11]. We have reported results from both methods to support future work that might incorporate these data, since both methods are commonly used in olfactory repertoire research [11,56,[63][64][65][66]. Improved future genome assemblies will help to determine a better estimate of the number of pseudogenes, and of the number of intact OR genes.
As has been reported for other species, lineage-specific expansions tend to occur in tandemly-arrayed clusters [56,67,68]. Tandemly-arrayed gene expansions are indicative of a series of gene duplications. We combined overlapping sequences and eliminated duplicates during analysis which could influence the detection of these tandem duplications. Identical sequences within different contigs were eliminated, as we interpreted these as artifacts from assembly. Pouched rats have the largest number of orthologous genes shared with mice and rats. The number of orthologous groupings with squirrels are consistently smaller than orthologies with mice or rats, due to phylogenetic placement. https://doi.org/10.1371/journal.pone.0221981.g006

PLOS ONE
The giant pouched rat (Cricetomys ansorgei) olfactory receptor repertoire Thus, the number of expansions was conservatively calculated, and future genome assemblies will potentially reveal additional clustering in the genome. Those cases that are shared among mice, rats and pouched rats indicate earlier expansions compared to the tandem arrays specific to pouched rats.
Unfortunately we found no ligands in the literature that were associated with mouse or rat ORs within the same subfamilies of pouched rat OR expansions [14,25,69]. Other species-specific expansions in OR repertoires are thought to support detection of salient odors [70], and ecological niche specialization [10]. Expansions in the repertoire of the pouched rat may relate to diet and niche, and examination of OR genes in sympatric rodents with similar diets and life history would test this hypothesis. Future work on discerning ligands for these ORs might reveal whether pouched rats are primed to specialize in detecting specific types of odorants that are particularly salient and relevant for this species.

Receptor conservation among rodents and within Muroidea
Approximately 7% of the pouched rat olfactory genome shows 1-to-1 orthology with the other rodents in this study-mice, rats and squirrels (Fig 6). Pouched rat gene sequences were more similar to mouse and rat sequences than to squirrel sequences, as expected. This agrees with the current phylogenetic hypothesis for rodents: rat and mouse are placed within the Muridae family together, with pouched rat in a different family (Nesomyidae) but within the Muroidea superfamily (Fig 1). Squirrels are in a different family (Sciuridae), whereas all four species compared in this study are within Rodentia. Given that pouched rats are more closely related to rats and mice than squirrels are to rats and mice, we should expect the OR phylogeny to reflect this phylogenetic relationship, and it does for these orthologous genes. Further supporting this assertion, an additional 12.8% of the pouched rat olfactory receptor genes have Table 3. Distribution of pouched rat olfactory receptors in location clusters.  orthologous mouse and rat olfactory receptor genes (Fig 6). Pouched rat orthology with only one species was also observed; though this was much more common with either mice or rats compared to the more distant rodent relative-the squirrel. Interestingly, there were 9 instances where orthology is shared between mouse, rat, and squirrel-potentially pouched rats have lost functional olfactory genes here during their evolutionary history. Four of these potential losses were from family OR 4 (as defined by the ORA family assignment method [11]), although the loss would not substantially affect the number and proportion of functional to non-functional OR genes given the numbers of pseudogenes in the associated pouched rat OR families. Future work investigating pseudogenes may yield additional insight into the orthologous OR genes among rodents. Unfortunately, no ligands have been characterized for these receptors, so we cannot speculate on how this might impact the olfactory capabilities or behavior of the pouched rat.

Additions to our knowledge about olfaction and OR repertoire
Most rodent species with a described OR repertoire are lab models, and include mice [25] and rats [28], while guinea pigs were included in a comparative study which did not focus on guinea pigs specifically [11]. Two other rodent species include some OR information (i.e. an estimate of ORs and pseudogenes based on published genomic information), but it is unclear which search terms were used, and whether the algorithm used would obtain OR families that were not shared by the exemplars (i.e. novel subfamilies). These species included the kangaroo rat (Dipodomys ordii) and the thirteen-lined ground squirrel (also included in this study) [11]; both of these species are native to North America. Here we have shown that the pouched rat OR repertoire of intact genes is very similar in size to those of other Muroids. The pouched rat OR repertoire is also more conserved within Muroids than within Rodentia, as expected. Our characterization of the orthologous genes within Muroidea suggests that there may be a set of conserved Muroid-specific ORs, although additional species will need to be sequenced and compared to support this hypothesis. The addition of the African giant pouched rat to described rodent OR repertoires also can serve as an outgroup for future analyses given its position in the Nesomyidae family, which has not been previously described.
We noted a pattern in our gene tree of rodent ORs where squirrel and pouched rat ORs tend to branch earlier than the mouse and rat ORs. Given that mice and rats are more closely related to pouched rats than to squirrels, we expected that orthologous genes might follow this pattern. We assume these genes should be orthologous and bind similar odorants with similar structures due to their similar protein sequences, however, it could be that small changes in these genes create ORs that bind different odorants, or change perception in other ways [20,21,71]. Identification of these potentially conserved ORs is important for understanding how the OR repertoire might respond to evolutionary pressures over time.

Conclusions
The description of the African giant pouched rat olfactory receptor repertoire adds to our understanding of how these repertoires vary within Muroidea and among rodents, and how these genes may have been conserved or diverged to serve different behavioral purposes. Our results suggested that there could be a conserved set of 'Muroidea' olfactory receptor genes. Whether pouched rats have enlarged olfactory bulbs to enhance their olfactory capabilities in spite of a typical OR repertoire size remains an open, yet interesting question. Further investigation into family-specific expansions within the pouched rat OR repertoire compared to mice and rats might reveal whether these changes have evolved as a way for pouched rats to olfactorily specialize within their habitat.