Genetic characterization of the oxytocin-neurophysin I gene (OXT) and its regulatory regions analysis in domestic Old and New World camelids

Oxytocin is a neurohypophysial peptide linked to a wide range of biological functions, including milk ejection, temperament and reproduction. Aims of the present study were a) the characterization of the OXT (Oxytocin-neurophysin I) gene and its regulatory regions in Old and New world camelids; b) the investigation of the genetic diversity and the discovery of markers potentially affecting the gene regulation. On average, the gene extends over 814 bp, ranging between 825 bp in dromedary, 811 bp in Bactrian and 810 bp in llama and alpaca. Such difference in size is due to a duplication event of 21 bp in dromedary. The main regulatory elements, including the composite hormone response elements (CHREs), were identified in the promoter, whereas the presence of mature microRNAs binding sequences in the 3’UTR improves the knowledge on the factors putatively involved in the OXT gene regulation, although their specific biological effect needs to be still elucidated. The sequencing of genomic DNA allowed the identification of 17 intraspecific polymorphisms and 69 nucleotide differences among the four species. One of these (MF464535:g.622C>G) is responsible, in alpaca, for the loss of a consensus sequence for the transcription factor SP1. Furthermore, the same SNP falls within a CpG island and it creates a new methylation site, thus opening future possibilities of investigation to verify the influence of the novel allelic variant in the OXT gene regulation. A PCR-RFLP method was setup for the genotyping and the frequency of the allele C was 0.93 in a population of 71 alpacas. The obtained data clarify the structure of OXT gene in domestic camelids and add knowledge to the genetic variability of a genomic region, which has received little investigation so far. These findings open the opportunity for new investigations, including association studies with productive and reproductive traits.

Introduction Nowadays, six species of camelids exist in the world which include both domestic (Camelus dromedarius, Camelus bactrianus, Lama glama, Vicugna pacos) and wild animals (Camelus ferus, Lama guanicoe, Vicugna vicugna). Their origin traces back to the Eocene (40-45 mya) when the first ancestors of the Camelidae family were found in North America [1].
Approximately 37 million camelids are kept worldwide [2]. The majority (about 75%) are Old World camelids (dromedary and bactrian) distributed in the Afro-Asian dryland, the former mainly in Somalia and the latter in Mongolia/China. Conversely, the Andean highlands are the natural habitat of New World camelids (llama, alpaca, guanaco and vicuna) mainly distributed in Peru, Chile and Argentina.
These species represent an important economic source for the rural populations of those areas, having a fundamental socio-cultural role in the development of pastoral zones. Camels are mainly kept for transport and milk production. As dairy animals, they often provide the staple food in pastoralist societies [3,4]. Conversely, there is no historic tradition of milking llamas and alpacas, which have not been exploited as source of this product. In fact, South American camelids (SACs) are kept as multipurpose animals, mainly for fleece, meat and transport, and often they are used as trekking animals, contributing to the development of the agro-touristic business [5][6][7].
Traits like milk yield, temperament and reproduction are of fundamental interest for the improvement of camelids breeding in many arid and semiarid areas, and their regulation may also be associated with the oxytocin release. Oxytocin plays an important role in the regulation of various physiological functions. It is an indispensable hormone for milk ejection from the mammary gland [8], it promotes social bonds and influences temperament in adult animals [9,10], it stimulates uterine smooth muscle contraction during labour/parturition [11]. Moreover, the oxytocin has other roles, including the control of oestrous cycle length, the follicle luteinization and the ovarian steroidogenesis. Furthermore, it acts as a neurotransmitter in the central nervous system (CNS) and plays a role in different processes like cognition, tolerance, adaptation, complex sexual and maternal behavior, as well as in the regulation of water excretion and cardiovascular functions [12].
The oxytocin-neurophysin I gene (OXT) has been sequenced and fully characterized in many species, including domestic ruminants [13][14][15], and polymorphisms have been associated with milk yield and flow [16], which are of great interest also in camelids. In fact, camel milk has been characterized with regard to its protein fractions [17][18][19][20] and biological active peptides [21], moreover a detailed description of llama milk proteins has been recently provided [22] and the genetic basis of casein synthesis has been elucidated [23,24].
Despite the considerable interest for the OXT gene and its function for the milk ejection, no information at DNA and protein level has been reported so far in camelids and no studies have been carried out to investigate genetic polymorphisms. Although the genome sequencing has been completed for the dromedary and bactrian camels, as well as for the alpaca [1,25], the annotation is still incomplete. Furthermore, the homology of orthologous gene sequences with camelids genomes showed gaps in that DNA region.
Based on all these considerations, an investigation was undertaken to explore the genetic diversity at the OXT in the four domestic camelids. We provide for the first time the full characterization and annotation of the gene regulatory regions and report on first polymorphisms, which could affect the OXT gene regulation.

Ethics statements
No animals were used in the present study. The samples used herein belonged to DNA collections available from past studies [19,23,24,26] already approved by different ethic committees. Therefore, according to the Committee on the Ethics of Animal Experiments of the University of Torino (D.R. n. 2128 released on 06/11/2015) further ethics approval was not required.

DNA samples
Samples used in this study belong to DNA collections of research groups in different areas of the world. In particular, dromedary DNA were provided by Nasarawa State University (Nigeria), bactrian camels DNA came from Justus-Liebig-University (Germany), whereas alpacas and llamas DNA belong to the collections of University of Turin (Northern Italy) and Justus-Liebig-University (Germany).
The original biological tissue used for DNA isolation was blood. Individual samples from unrelated animals belonging to different farms were collected during routine treatments according to national rules on animal welfare of the country of origin. Details on DNA isolation procedure were reported in previous studies [19,23,24,26]. Briefly, a blood drop for each dromedary was spotted on individual cellulose filter paper, air dried at room temperature, transported to the laboratory and then used for DNA isolation. Conversely, fresh blood samples were collected from the other camelids.
The filter paper was soaked (56˚C, overnight) in 500 μl sodium-Tris-EDTA buffer with 10 μl proteinase K (10 mg/ml) in presence of sodium dodecyl sulfate (SDS). DNA was isolated from the emerging lysis according to the procedure described by [27] and resuspended in 100 μl TE buffer pH 7.6 (10 mM Tris, 1mM EDTA). Conversely, the fresh samples were treated according to the Spin Blood Mini Kit (Invitek, Germany).
Concentration and OD 260/280 ratio of the DNA samples were measured with the Nanodrop ND-1000 Spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). Average concentrations were 50 ng/μl whereas a ratio higher than 1.8 was recorded for all the DNA samples.
Twelve DNA samples (three samples from each species) were chosen for sequencing the whole OXT gene. In addition, 20 dromedaries, 68 alpacas and 20 llamas' DNA samples were used for genetic polymorphism discovery and genotyping.

PCR amplification and sequencing
All primers for the amplification and sequencing (Table 1) were designed by DNAsis-Pro software (Hitachi Software Engineering Co., San Bruno, CA, USA) using as template the Con-tig_15904_16 belonging to an unannotated genome fragment of dromedary (Genbank ID: LSZX01119753) and putatively containing the target gene based on the gene homology with the OXT gene sequences of domestic ruminant (GenBank IDs: X00502; AM234538; LT592265; LT592266).
PCR mixtures were prepared according to [23], whereas the thermal conditions applied for the amplification were: 97˚C for 4 min, 35 cycles at 97˚C for 45 s, annealing for 45 s with temperatures depending on the amplicon (Table 1), and extension at 72˚C for 45 s. A final extension was carried out at 72˚C for 5 min.
The amplified products were analysed by electrophoresis on agarose gel in 0.5X tris boric acid EDTA buffer (TBE), purified using NucleoSpin 1 PCR Clean-up (Macherey-Nagel GmbH, Düren, Germany) and sequenced in both directions at Microsynth GmbH (Vienna, Austria) by classic Sanger technology.
SNP discovery in dromedary, alpacas and llamas was accomplished by the sequencing of additional 60 DNA samples in total (20 for each species) for the whole OXT gene including the promoter and the 3' flanking region.

Genotyping of the polymorphism g.622C>G in the alpaca promoter
A PCR-RFLP (restriction fragment length polymorphism) method was developed for the SNP MF464535:g.622C>G found in the promoter region of the alpaca OXT. A DNA fragment 957 bp long was amplified using the primers described in Table 1 for the amplicon 1.
Five μl of each amplicon was digested with 1U of FastDigest Bfo I endonuclease (5'-RGCGC#Y-3') (Thermo Scientific) for 15 min at 37˚C. The digested products were analysed by electrophoresis in 1.5% agarose gel in 0.5X TBE buffer and stained with Sybr Green (Sigma-Aldrich).

Bioinformatic and statistical analysis
Homology searches, comparison among sequences, and multiple alignments were performed by DNAsis-Pro (Hitachi Software Engineering Co., San Bruno, CA, USA). Prediction signals and combined cleavage sites for leader peptides was determined by SignalP V4.1 software (www.cbs.dtu.dk/services/SignalP). Splice site prediction was performed by NNSPLICE ver. 0.9 (http://www.fruitfly.org/seq_tools/splice.html), whereas branch point prediction was carried out by SVM-BP finder software (http://regulatorygenomics.upf.edu/Software/SVM_BP/). The putative transcription factor binding sites at the promoter were searched by Transfact 7.0 software considering the most stringent condition of analysis (85% as minimum binding score and 100% similarity of the sequence to consensus matrix), whereas the 3'-flanking region was analysed for potential microRNA sequences by using the TargetScan method [28]. Allelic frequencies and Hardy-Weinberg equilibrium were evaluated for the SNP MF464535:g.622C>G using PopGene software ver. 1.32 (University of Alberta, Canada).

OXT gene structure in domestic camelids
Whole genome assemblies on scaffold-level are available for dromedary [1,25], Bactrian camel [29], and alpaca [30], in addition to other camelid genome projects in databases. Often the low coverage assembly and the tentative annotations built on the human genome led to redundant information, exon losses and errors in gene annotations. This is even more evident for less investigated species like camelids. This is well exemplified by the β-casein in the alpaca genome. In fact, the exon 3 is out-spliced in human and, since the human genome is used as reference, this exon has not been annotated in alpaca. However, the DNA sequence encoding for the exon 3 can be easily retrieved approximately 130 bp upstream of the provided Vicugna pacos genomic sequence [23]. This example highlights the need to gain more experimental data to help the annotation of the new investigated species.
In this context, we sequenced the whole gene encoding the oxytocin-neurophysin I ( The ORF (Open Reading Frame) region is 375 bp long and it codes for the signal peptide (19 amino acids-nucleotides 34-90) and for the 106 amino acids of the mature oxytocin-neurophysin I complex. In particular, the first 9 amino acids of the complex (from nucleotide 91 to 117 of the exon 1) belong to the nonapeptide hormone, followed by the tripeptide processing signal (GKR) (from nucleotide 118 to 126), and finally 94 amino acids of the neurophysin I (last 27 bp of the exon 1, complete exon 2 and first 54 bp of the exon 3).
The translation stop codon TGA is located between the nucleotides 54-56 of the exon 3, whereas the polyadenylation signal (aaataaa) is located between the nucleotides 94-100 of the same exon. Splice donor and acceptor consensus sequences conforming to the GT/AG rule were identified at the exon/intron boundaries.
Interestingly, the duplication of 21 bp (MF464533: g.1567_1608dupGGGGCCGGGCCGGCGGGGACC) was observed only in the dromedary OXT sequence between the nucleotides 1567-1608 of the intron 2 (Fig 1). This event is unique also in comparison with the domestic ruminants. Duplication events are not rare for the OXT gene, being itself the result of an ancestral gene duplication also involving the vasopressin gene [12]. Examples of repeated DNA sequences were recently reported in sheep OXT intron 1 and in goat 3' flanking region [13]. The duplication observed in dromedary was monomorphic, therefore its fixation is responsible of the sequence expansion of the intron 2 (115 bp long) compared with that of the other camelids (94 bp in bactrian camel and 93 bp in llamas and alpacas). Conversely, the dromedary showed a shorter intron 1 for the deletion of an eptanucleotide (MF464533:g.1184_1191delGCTTTTG) (Fig 1).

Similarities and differences of the OXT gene in camelids and ruminants
Overall, the OXT gene in domestic camelids shares a similar organization with the ruminant counterpart [13,14,31], although showing some differences both at exonic and intronic levels. The cDNA is 41 bp shorter than the bovine one (471 bp vs 512 bp). The main differences were detected in the exon 3 corresponding to the 3' untranslated region (S1 Fig). A similar situation has been already described in genes controlling milk proteins in camelids. For instance, compared to cattle, the 3' UTR of the CSN2 is shorter in camel (317 bp vs 323 bp) [19] and the CSN1S2 is shorter in llamas (981 bp vs 1024 bp) [23].
The intron sizes for the OXT gene are rather different between domestic camelids and domestic ruminants. The intron 1 is shorter for the first group (239/246 bp) compared to the bovine counterpart (309 bp), whereas the situation is opposite for the intron 2, which is slightly longer (93 to 115 bp) in camelids than in cattle (90bp). These differences result in lower exon/ intron size ratios for the domestic camels (from 1:1.33 in dromedary to 1:1.38 in the other investigated camelids) than that observed in cattle (1:1.28). In total, the level of similarity found with OXT gene of the domestic ruminant was about 70.0%.

SNP discovery and analysis of genetic diversity
SNP discovery was accomplished by re-sequencing the whole gene, including the flanking regions, for 60 animals belonging to dromedary, alpacas and llamas (20 samples each). The intra-species comparison of the sequences showed a total of 17 polymorphic sites (9 transitions, 7 transversions and one deletion). The New world camelids were more polymorphic (14 SNP) than the Old world species (3 SNP); the alpaca OXT was the most variable with 8 SNP found ( Table 2). The lower genetic variability found in Old World compared to New World camelids confirms the reduction of genetic diversity in the former species as consequence of at least two severe bottlenecks which negatively affected the effective population size [32].
Most of the polymorphisms were found in the regulatory regions, 7 in the promoters and 4 in the 3' end (Table 2), while no genetic variability was detected in the coding regions. At intron level four SNP were found, but none of them affected key sites of the spliceosome machinery (acceptor sites, branch points, polypyrimidine tracts, donor sites). Therefore, these mutations are not expected to influence the gene expression.
The comparison among the OXT sequences of the four species showed 69 additional polymorphic sites (Table 3). Almost all the detected SNPs (85.5%) differentiate the Old from the New World camelids. Five nucleotide differences were found at the exon level. One of these, located in the exon 2, is a transversion g.1417C>G (taking as reference the dromedary sequence) which is responsible for the amino acid replacement p.Gly60Ala within the neurophysin coding region (residue 29 of the neurophysin). The glycine is typical of the Old World camelids, whereas the alanine characterises the SACs. Furthermore, two transitions are synonymous (g.1078C>T, p.Leu27 = at the exon 1; and g.1421T>C, p.Asp61 = at the exon 2) and two SNP fall within the 3' untranslated region (Table 3).
It is also interesting to notice that alpacas and llamas are characterized by three common polymorphisms (Table 3), the first in the promoter (g.668A>G and g.717A>G), the second in the 3'UTR (g.1682C>G and g.1731C>G) and the last in the 3' flanking region (g.1849C>G and 1898C>G). A similar event was observed also in the sheep and goat OXT for the transition LT592265:g.438T>C, where convergent evolution, genetic introgression or adaptive genetic variation in the form of Trans-Specific Polymorphism (TSP) have been considered as possible evolutionary mechanism responsible for this event [13]. However, unlike sheep and goat, where evolutionary divergence has slowly developed and finally fixed by a different arrangement and number of chromosomes (sheep 2n = 54 and goat 2n = 60), which de facto makes hybridization very difficult in nature [33], the hybridization between alpacas and llamas is much more extensive [5]. This is due to the conserved structure of their karyotypes, both made of 74 chromosomes, which makes also their offspring fertile and backcrossing to either parental species possible [34]. As a consequence, genetic introgression in SACs might be considered as the main mechanism responsible for the common polymorphic sites observed herein. This is consistent with the genetic introgression observed in Old world camelids [32] and with the data reported by Kadwell et al. in 2001 [5] on substantial mitochondrial introgression highlighted in the alpaca by cytochrome b sequencing and nuclear introgression detected in the llamas using microsatellite analysis.

Analysis of the regulatory regions
The analysis of the regulatory regions (promoter and 3'-end) provides an important contribution for the evaluation of the factors involved in the regulation of gene expression. In fact, the promoter contains consensus sequences, which bind transcription factors enhancing the gene expression. The 3'-flanking region plays an important role in the repression of the gene expression mediated by microRNA [35]. Therefore, any polymorphism falling in these regions might affect binding sites and modify the transcription rate or the mRNA stability and, consequently, the amount of the protein. Therefore, we decided to extend the sequencing to the 5'-flanking region (more than 950 bp upstream exon 1) and to the 3'-flanking region (more than 300 bp downstream exon 3) for SNP discovery purposes and for the analysis of the putative regulatory regions.
Sharing of transcription factor SP1 between domestic camelids and ruminants. Homology level and similar locations of the main putative transcription binding sites were already reported for the OXT promoters of domestic ruminants [12][13][14][15]. Similar mechanisms of regulation are supposed for ruminants. Therefore, we compared the OXT promoter sequences of domestic camelids with the cattle (AB481096), buffalo (AM234538), sheep Table 2

Species
GenBank  (LT592265) and goat (LT592266) homologous sequences to explore possible similarities and differences. Bioinformatic analysis using Transfact 7.0 software found that the OXT promoter region in camelids contained at least 18 high-scoring (85%-100%) putative binding sites (S2 Fig). Half of them are represented by the transcription factors SP1, which regulates target genes by steroid hormone receptors [36]. The high number of these motifs is due to the considerable distribution of GC-rich sequences, which in several gene promoters are cis-acting elements for an increasing number of ligand-activated receptors, that interact with SP1 and related proteins [36,37]. It is interesting to notice that only one SP1 (-122/-113) is shared between domestic camelids and ruminants. The other 8 SP1 are only partially shared among the camelids and in total the SACs showed one more SP1 than Old World camelids.
Interaction between the OXT gene and hormone receptors. Several members of the nuclear receptor family, including many orphan receptors, could interact with the OXT gene and regulate its expression. For instance, hexanucleotide AGGTCA motifs and its variations (direct or inverted repeats) have been reported as part of binding sites of estrogen receptors (ERα and ERβ), thyroid hormone receptor (THRα) and retinoic acid receptors RARα and RARβ [38,39]. In this study, two putative estrogen response elements (ERE) half sites were identified in position -85/-81 and -105/-101, but similarly to human, the authentic palindromic ERE (GGTCA-TGACC) was not found. This result suggests that the oestrogen effect on OXT transcription could be indirect rather than direct, as it was already observed in OXTR [40]. In fact, such ERE half sites fall within a minor composite hormone response element (CHRE) as reported in rat [41] and they can act synergistically with the full CHRE [12] found upstream in position -165/-153.
The CHRE is considered as the most conserved element essential for the regulation of OXT gene expression. In fact, the deletion of this region in co-transfection experiments resulted in complete loss of most of the responsiveness to estrogen and retinoid acid [42]. Differently from ruminants, the consensus sequence of the CHRE in the domestic camelids (GGTGACCTTGACC) corresponds perfectly to those of humans and rats. Therefore, a regulation through the same activators, like estrogen receptor-β, thyroid hormone, retinoic acid, steroidogenic factor I [43], and repressor, like chicken ovoalbun upstream promoter transcription factor I (COUP-TFI) [44], might also be working in camelids.
Additional consensus sequence characteristics of the OXT gene. Additional consensus sequences characterise the OXT promoter of domestic camelids, in particular, a TATA-box (-29/-24) and a CCAAT/enhancer binding protein-α (C/EBP-α) between the nucleotides -683/-674, both conserved between camelids and ruminants. Conversely, one nuclear factor 1 (NF1) binding site (-187/-178) is shared between camelids, cattle and buffalo, but it is not present in sheep or goat. NF1 consensus sequences characterise the promoters of many genes, where it may bind synergistically with some other transcription factors, including estrogen receptors [45].
Seven polymorphic sites were detected in the promoters of SACs (Table 3). The bioinformatics analysis showed that only the mutation MF464535:g.622C>G found in alpacas affects a putative regulatory binding site. In fact, the presence of guanine is responsible for the lack of a consensus sequence for the specificity protein 1 (SP1) transcription factor between the nucleotides -291/-281. This sequence is well conserved among domestic camelids, and its location only 115 bp upstream the CHRE suggest a key role in the OXT gene expression. It is known that any change in the consensus sequence results in a different strength of the transcription factor binding and, consequently, in the expression. For instance, an in vitro assay test in sheep OXT promoter recently demonstrated that the C allele for the SNP g.438T>C falling into a binding site for the transcription factor Oct-1 negatively affected the promoter activity of the OXT gene [13]. Therefore, the present result in alpacas opens the way to new investigations on expression analysis never studied in this species so far. The 3' end is also an important region for the control of the gene expression. In particular, microRNAs (miRNA) bind to the 3' UTRs of the target genes and negatively regulate their expression [35]. Although the biological functions of most miRNA are unknown, it is estimated that more than 30% of protein-coding genes are regulated by miRNA [28]. The analysis of the 3' flanking region in SACs showed genetic variability that could affect the gene regulation. A total of 6 polymorphic sites were found in the 3'end of the OXT gene, 2 in alpacas and 4 in llamas (Table 3). In particular, the mutations MF464535:g.1682C>G in alpaca and MF464534:g.1731C>G in llamas are conserved in SACs and they fall only 19 bp downstream the stop codon in the 3'UTR. Therefore, we have investigated whether such transversion could influence miRNA binding sites. Using the homologous human 3'UTR of the OXT gene as target, the bioinformatics analysis showed that the SNP in the 3'UTR of SACs OXT affects the binding sites of several miRNA (mir-4651, mir-608, mir-6737-5p, mir-6819-5p, mir-6747-5p, mir-342-5p and mir-4664-5p) characterised by different matching (8mer, 7mer-m8 and 7mer-A1) to the same canonical seed sequence CCACCCC (Fig 2).
The regulation of OXT gene expression by miR has already been reported in mouse hypothalamus by Choi et al. [47], who demonstrated that the miR-24 inhibits oxytocin production by targeting the boundary region between the coding sequence and 3 0 UTR. Therefore, we cannot exclude the possibility that in camelids a region proximal to the stop codon might mediate OXT gene expression through the bound with the mir-4651 (http://www.targetscan.org/cgibin/targetscan/vert_71/targetscan.cgi?mirg=hsa-miR-4651), the mir-608 (http://www. targetscan.org/cgi-bin/targetscan/vert_71/targetscan.cgi?mirg=hsa-miR-608), or other novel candidate miRNAs. However, the biological significance of these putative candidates needs to be elucidated in future studies.

Genotyping of the SNP g.622C>G in the alpaca promoter
Since the SNP MF464535:g.622C>G in alpacas falls in the promoter region within a putative consensus site for the SP1 transcription factor, a PCR-RFLP protocol was set up for the quick genotyping of 71 alpaca samples. The digestion of the PCR product (957 bp) by Bfo I allows the identification of both alleles (Fig 3).
The homozygous CC is undigested, whereas the GG is restricted into two 2 fragments of 726 bp and 231 bp. The restriction pattern of the heterozygous genotype showed three fragments of 957 bp, 726 bp and 231 bp (Fig 3). The C allele had a frequency of 0.93 and the χ2 value (1.379) showed no evidence of departure from the Hardy-Weinberg equilibrium (P 0.05). The low frequency of the G allele might be due to a genetic feature of the species or to a recent mutational event, but we can not exclude a selective disadvantage of this allele. In fact, the SNP g.622C>G falls also within a CpG island and the presence of the guanine (5'-CACGGC-3', herein underlined) creates a further putative methylation site for the upstream cytosine (in bold). DNA methylation is an epigenetic mechanism which regulates gene expression by influencing the recruitment and binding of regulatory proteins to DNA, and, in this respect, OXT does not represent an exception. In fact, data from the bisulphite sequencing of human genome (GRCh37/hg19 assembly) showed at least 14 methylated CpG sites within the first 350 bp upstream the exon 1 of the OXT gene (http://genome-euro.ucsc.edu/index.html), whereas a recent investigation to target and quantify DNA methylation across the promoter region of human OXT (Chr20, position 3052266-3053162) demonstrated an average methylation of 47% out of 9 CpG investigated sites [48]. In the same study, differences in the methylation level of the OXT promoter have been linked to several overt measures of human sociability.
These indications offer new opportunity of investigation to verify whether and how the genetic variant g.622G interferes in the gene transcription process and opens the way to new studies, including gene expression analysis, DNA methylation (bisulphite sequencing) and other epigenetic mechanism (post-translational modifications of histone proteins, noncoding RNAs, nucleosome positioning along the DNA). This will also increase the knowledge and annotation accuracy of the genome information also in domestic camelids.

Conclusions
The present investigation has elucidated the structure of OXT gene in domestic camelids, providing fundamental knowledge on similarities and differences among them and with domestic ruminants. The polymorphisms described, provide useful information not only for OXT biodiversity itself, but also for possible future association studies with traits directly controlled by this hormone.
The identification in the promoter of the putative consensus sequences for the most important transcription factors, and in the 3'end of the presumed microRNA sequences, suggests the possible involvement of these motifs in the regulation of gene expression, thus opening future possibilities of investigation to verify the influence of the novel allelic variant in the OXT gene regulation.
Supporting information S1 Fig. Homology between camelids and ruminants OXT gene. Homology between the complete nucleotide (nt) sequences of oxytocin-neurophysin I encoding (OXT) gene in domestic camelids (present work) with the corresponding OXT sequences of domestic ruminants. Numbering is relative to the first nucleotide of the first exon (+1) and dashes represent nt identical to those in upper line. The signal peptide is underlined, the coding region corresponding to the nonapeptide hormone is indicated in bold, whereas the neurophisin I is in bold italics. The tripeptide processing signal (GKR) is double underlined and asterisks indicate the stop codon. The deletion of an epta-nucleotide (GCTTTTG) and the duplication event of 21bp in C. dromedarius are indicated in bold-shade and wave-underlined respectively. The polyadenylation signal site is dot-underlined. (DOC)