Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio)

Background Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. Results In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. Conclusions To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an essential genomic resource for future biochemical, toxicological, physiological, and evolutionary studies in common carp.


Introduction
Aquaporins (Aqps) are a large superfamily of major intrinsic proteins (MIP), which selectively control the flow of water and other small molecules through biological membranes [1]. Therefore, Aqps play an important role in maintaining body osmotic balance for many organisms, especially for aquatic organisms. The first water channel protein was reported as Aqp1, which plays diverse roles in the mammalian erythrocytes [1,2]. Since then, more and more genomewide analyses of aqps have been published, and deposited thousands of aqp genes into public databases. For instance, it was reported that zebrafish (Danio rerio) have up to 20 aqp genes [3] in the diploid genome. There are 42 aqp paralogs in the Atlantic salmon (Salmo salar) genome, which has experienced an additional round of whole genome duplication (WGD) compared with most other diploid teleost fish [3]. Therefore, the aqp genes are in the duplication manners in the tetraploidized salmon genome.
Vertebrate aqps used to be classified into 13 classes. However, recent study has reported a total of 17 classes in aqps gene family in various vertebrates by Finn et al. Besides of 13 aqp classes retained in human genome, there are four additional classes, including aqp13 in aquaglyceroporins, aqp14 and aqp15 in classical aquaporins, and aqp16 in aquaporin-8 in various vertebrate genomes. Therefore, the vertebrate aqps were classified as follows: classical aquaporins (aqp 0, -1, -2, -4, -5, -6, -14, and -15), aquaglyceroporins (aqp 3, -7, -9, -10, and -13), aquaporin-8 (aqp 8, and -16), and unorthodox aquaporins (aqp 11 and -12) [3]. Although overall primary sequences are not well conserved (approximately 30% identity), all Aqps share a relatively conserved molecular structure, containing six membrane-spanning segments (TM1-TM6) with five connecting loops (LA-LE) [4]. Each Aqp half contains a conserved asparagine-proline-alanine (NPA) motif, located at LB and LE, that form short hydrophobic helices and dip halfway into the membrane from opposite sides, facing each other and participating in substrate selectivity [5]. A cysteine residue at position 189 in LE of human Aqp1 and 181 of Aqp2 is responsible for conferring mercury sensitivity [6]. In the biological membrane, Aqps are grouped as homotetramers embedded in the lipid bilayer and each monomer functions independently as a single pore channel [5].
Cyprinids are one of the most important teleost families in the world. Many species are domesticated as important aquaculture fish for food and ornamental purpose. Despite of the importance of Aqps for teleost fish, limited studies have been performed on aqp gene family in cyprinid species, except model species zebrafish. Common carp (Cyprinus carpio) originated in Europe and Asia. The species has been domesticated and introduced into various environments worldwide. It is an important economic and model species for various studies on ecology, environmental toxicology, developmental biology, nutrition, physiology, immunology, and evolutionary genomics. Therefore, significant genome resources have been developed in the past decade, including vast amount of genetic markers [7][8][9][10], genetic maps [11][12][13], BAC libraries and physical maps [14][15][16], expressed sequence tags (ESTs) [17], and transcriptome sequences [8,18]. Recently, the common carp genome has been completely sequenced and assembled [19]. The evidence has shown that common carp is a species with an allotetraploidized genome, which had experienced an additional round of whole genome duplication (WGD). It has been hypothesized that the duplicated genome provides the basis for its enhanced adaptation to varied environments [20]. Therefore, it is of interest to determine if the number of aqp genes is doubled comparing with that of other diploid teleost fish, and elucidate Aqp functional evolution post the most recent WGD event.
In this study, by utilizing all available common carp genomic resources, we identified 37 aqp genes across the genome. Further phylogenetic analysis confirmed the gene annotation and nomenclature. Moreover, we examined the tissue distribution of aqp genes in common carp. The expression patterns of each gene, together with the results from comparative study with other vertebrate species, were used to infer the potential functions of aqp genes in common carp. Our study on examining the aqp gene family in common carp provides insights into the evolutionary and physiological aspects of post-WGD adaptation in common carp.

Aqp gene identification and characterization
We have identified a total of 20 aqp genes from zebrafish (D. Rerio) genome. Therefore, we used the 20 aqp genes of zebrafish as query to screen their orthologs in common carp genome. A total of 37 putative members of the aqp gene family have been identified from common carp genome. The 37 genes were distributed on 19 chromosomes and 18 scaffolds in the common carp genome, which are significantly more abundant than that in most other vertebrate genomes. For instance, there are 19 aqp genes in human (H. Sapiens), and 15 aqp genes in clawed frog (X. Tropicalis) and 12 aqp genes in medaka (O. Latipes) genome. Detailed information of their location, corresponding genomic sequences, coding sequences and DDBJ database accession number are summarized in Table 1.

Phylogenetic analysis and nomenclature of aqp gene family in common carp
In the evolution of higher eukaryotes, WGDs followed by polyploidization, as well as gene loss, have been an important recurrent process. Ancient WGDs, inferred from analyzed sequenced genomes and comparative genomics, are prevalent and recurring throughout the evolutionary history of higher eukaryotic lineages [22]. To examine phylogenetic relationships of aqp genes in the teleosts and representative higher organisms, we collected a total of 103 aqp genes from five species, including human, clawed frog, medaka, zebrafish, and common carp. Also, phylogenetic analysis can be used to support the gene annotation, especially for non-model species [23], we investigated the molecular phylogeny of these aqp genes to validate the orthology of the common carp aqps.
Two phylogenetic dendrograms constructed based on alignments of the amino acid sequences of the Aqp proteins using both neighbor-joining (NJ, Fig 1) and maximum likelihood (ML, S2 Fig) showed high topological consistency, indicating the reliability of the phylogenetic relationships of the aqp genes. As shown in these two figures, the phylogenetic analysis results showed that each of common carp aqp genes clustered with its respective counterpart from other species, indicating all genes are highly conserved.
As previous studies reported, two rounds of WGD have occurred in the ancestor of vertebrates, plus two in the lineage of common carp, of which, the 3R WGD is known as teleost-specific (TS) WGD, and the 4R WGD is only occurred in some tetraploid teleost, such as salmonids and some cyprinids [26]. Common carp genome had been previously confirmed as allotetraploidized genome based on comparative genomic studies [19]. Significant gene duplications are presented in the Aqp topologies of common carp, which are clearly consistent with previous findings [27][28][29].

Gene duplications and losses in common carp
WGD is one of the major drivers that shaped the evolutionary history of many vertebrates. Ohno has suggested that two rounds of large-scale gene duplication had occurred early in vertebrate evolution [30], and a number of studies of comparative analysis of various gene clusters provided solid evidence in support of Ohno's hypothesis [31][32][33]. An additional round of duplication, also named teleost-specific (TS) WGD, or the 3R WGD [34,35], took place in the common ancestor of all extant teleosts.
As a result of genome duplication, teleost fish usually have two paralogous copies for many genes, while only one ortholog is present in tetrapods. Also, it is generally-accepted hypothesized that, comparing to other teleost, salmonids and some cyprinids such as common carp and goldfish had undergone additional whole genome duplication (the 4R WGD) [27], Microsatellite analysis [28] and comparing common carp linkage map to zebrafish genome [29] provided critical evidence in support of the 4R WGD event in common carp [36]. The comprehensive estimation based on whole genome datasets suggests that the latest WGD event occurred around 8.2 MYA [37]. Therefore, the significant expansion of aqp genes in the common carp genome may be the result of this additional WGD, which could have caused a sudden doubling of the aqp genes. As shown in Table 2, common carp retained double or more than double the aqp copies of the zebrafish aqp genes, except aqp8bb, which strongly suggests that the 4RWGD event was the major contributor to aqp gene family expansion in common carp. Similar results were observed when to the common carp aqp genes were compared with the aqp genes in other teleost genomes.
Although mammals and teleosts last shared a common ancestor many hundred million years ago, a growing number of studies have reported extensive conserved synteny between the chromosomes of teleosts and mammals, which favors the rule of additional genome duplication in fishes [38]. In this study, syntenic blocks of aqp1 genes were constructed as shown in Fig 2. Clearly, in zebrafish, the two aqp genes are on chromosomes 2 and, in common carp, the four genes were distribute on chromosomes 2 and 4. Here, we consider a new evolutionary scenario to explain the gene duplication event in common carp. Assuming that the putative teleost ancestor had the aqp1 aa/ab genes on two different chromosomes, then, when genome duplication was finished in common carp, the four orthologs (aqp1 aa-1/aa-2/ab-1/ab-2) were distributed on four different chromosomes. However, with the cis and trans mechanisms, in the common carp genome, the two aqp genes on other chromosomes moved to chromosome 4, resulting in three aqp genes on chromosome 2 and one aqp on chromosome 4 (Fig 2). This hypothesis can explain the distribution of aqp in the common carp genome reasonably. After duplication, one of the two redundant copies of a gene should theoretically be free to degenerate and become lost from the genome without consequence [39,40]. Most gene pairs formed by a WGD have only a brief lifespan before one copy becomes deleted, leaving the other to survive as a single-copy locus. We observed that there are only one aqp8bb in common carp and two copies in zebrafish, which is different with other gene number comparison. The Aqp8bb protein sequences were found to be highly conserved across all the vertebrate species, suggesting that the conserved aqp8bb gene is critical for survivability and very little change is allowed in its coding sequence and copy number in common carp. Abundant copies of a single gene might accumulate detrimental mutations due to relaxed selection on one of the duplicates. Gradually, they will become pseudogenized and they will either be deleted from the genome or become so diverged from the parental genes that they are not identifiable any longer [41,42]. In addition, we have not identified that aqp 2, 5, and 6 are absence in all surveyed teleost fish but retained in other vertebrates, which is consistent with previous report [3], suggests the gene losses occurred in the common ancestor of teleost fish post the divergence of teleost fish and tetrapods. Regarding gene losses, it may occur in aqp gene family in common carp post the latest WGD as those identified 37 aqp genes are much less than our expectation, however, we also suspect another possibility that imperfect genome assembly and annotation lead to the "gene losses", especially on such a tetraploidized genome of common carp [43].

Expression profiling of aqp genes in common carp and potential functional inferences
Exploring expression profiling of aqps could help to speculate their functions. The relative expression of the common carp aqp genes in adult tissues was evaluated by RT-PCR employing isoform-specific oligonucleotide primers. As shown in Fig 3, the aqp gene family exhibited unique tissue-specific expression. In general, most of the aqp genes were widely expressed, but has a relatively high expression levels in brain, spleen and intestine and relative irregularity expression levels in other tissues. Also, we observed that aqp genes were almost no expressed in muscle, implying their unimportant roles in muscle organ development. Aqp8aa-1 and aqp14-1 were highly expressed in skin, suggesting their specific expression and special functions in the development of skin in common carp. As expected, we do observed significant difference on aqp expression profiles (Fig 3), which implied the functional divergence of duplicated aqp genes. For instance, we observed some consistent expression patterns in two copies of aqp genes in common carp, including aqp 0a, 1ab, 3b and aqp 8aa. Moreover, distinct expression patterns in two copies of 15 aqp genes, including aqp 0b, 1aa, 3a, 4b, 7, 8ab, 9a, 9b, 10a, 10b, 11b, 12, 14 and 15 were also obsevered. The two copies of aqp 0b exhibit an almost complementary expression pattern. For the rest of the 14 aqp genes, one of the two copies have broad expression profiles in surveyed tissues and the other one have relatively narrow expression. The spatial expression difference of the two copies of genes suggested quick functional divergence of these newly emerged aqp copies. It has been recommended that unless the presence of an extra gene product is of advantage, two genes with identical functions are unlikely to be stably maintained in the genome [42]. As the results, the duplicates would develop difference in some functional aspect, such as subfunctionalization, which could be stablely maintained in the genome. The expression profiles of aqp gene family in common carp suggested that aqp duplicates evolve quickly and subfunctionalization is commonly occurred in the tetraploidized genome. We also observed significant gene expression differences compared with previous studies on model species. For instance, comparing the expression patterns with other vertebrate species, like zebrafish and human, conservation/divergence patterns were revealed as expected. In common carp, three of the four aqp1 copies showed tissue-wide expression patterns, while the remaining aqp1aa-2 had a tissue-specific expression pattern. Similar cases occurred in zebrafish, where one of the aqp1 genes the most ubiquitously expressed aqps, while another aqp otholog only expressed in several specific tissues [44], consistent with the presence of the human aqp1 ortholog [45]. Furthermore, aqp3 are mostly ubiquitously expressed in human and common carp tissues, however, there are no gene expression in the liver of zebrafish which are different with its orthologs in both common carp and human. These phenomenon maybe indicated that the expression pattern in different genes is different and has its own species-specific [46]. Obviously, these significant expression differences in those duplicated aqp genes, providing evidence for gene subfunctionalization post-WGD event. Most likely, the ancestral gene was capable of performing all functions and was expressed broadly in the tissues, while the descendant duplicate genes only perform partial functions and are specifically expressed in certain tissues. The functional divergence of duplicated genes may avoid potential adaptive conflicts [45].

Conclusion
In this study, we identified a total of 37 aqp genes in tetraploidized common carp genome. Phylogenetic and syntenic analysis as well as comparative genomic study revealed comprehensive understanding of aqp gene family and their distribution in the genome. Our analyses revealed extensive gene duplications in common carp which result from the additional WGD in common carp. Expression profiles of the complete set of aqp genes in common carp were assessed, which revealed extensive gene functional divergence in aqps in common carp. Our study provides essential genomic resources for future biochemical, toxicological, physiological, and evolutionary studies in common carp.

Ethic statement
This study was approved by the Animal Care and Use committee of Centre for Applied Aquatic Genomics at Chinese Academy of Fishery Sciences. The methods were carried out in accordance with approved guidelines. Adult common carp were collected from the Breeding Station of Henan Academy of Fishery Research, Zhengzhou, Henan province, China. Euthanasia is performed by immersion fish in MS-222 solution, and all efforts were made to minimize suffering.

Aqp identification and sequence analysis
All available aqp gene sequences and Aqp amino acid sequence from four species (human, clawed frog, medaka, zebrafish) were downloaded from public database Ensembl (http://asia. ensembl.org/), GenBank (http://www.ncbi.nlm.nih.gov/genbank/) and ZFIN (http://zfin.org/). The genomes of these four species have been well-characterized and annotated previously. Amino acid sequences of Aqps in zebrafish were used as queries to search against all available common carp genomic resources by BLAST tools, with an E-value cutoff of 1e-5 to acquire the candidate aqp genes. Then reciprocal BLAST searches were conducted by using the candidate common carp aqp genes as queries to verify the veracity of candidate genes. The predicted sequences were extracted, analyzed, and confirmed by BLASTP searches against the NCBI non-redundant protein sequence database (nr).
The simple modular architecture research tool (SMART) was used to predict the conserved domains in common carp AQPs. The simple modular architecture research tool (SMART, http://smart.embl-heidelberg.de/) was used to predict the conserved domains based on sequence homology and further confirmed by "conserved domains" prediction software (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?) in NCBI.

Phylogenetic analysis
To annotate the aqp genes, phylogenetic analysis was conducted with reference Aqp proteins from zebrafish, and other representative vertebrate species. First of all, the Aqp protein sequences of four surveyed species were downloaded from the Ensemble databases. Then, the translated protein sequences of the common carp aqp orthologous genes and the Aqp protein sequences from the four other species (a total of 103 sequences) were aligned using ClustalW2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/). The sequences were then manually trimmed of all sites that were not unambiguously aligned. The deduced Aqp protein sequences were used for phylogenetic analysis in conjunction with reference Aqp proteins from zebrafish, medaka, frog, and human. We performed neighbor-joining (NJ) analysis in MEGA7 [47] with the pdistance model/method. Also, a maximum likelihood (ML) tree with default parameters was constructed using the MEGA7 to verify the accuracy of the toplogy of NJ tree. A total of 1000 bootstrap replicates were conducted for each calculation.

Gene nomenclature
Zebrafish aqp genes were named in accordance with previous studies [44,48]. The aqp orthologous genes in common carp were named based on their phylogenetic topologies as well as BLAST result with their most related zebrafish genes. First, the subfamilies and gene members were determined for each common carp aqp orthologs based on the phylogenetic clades and the result of BLAST (for instance, aqp 0, aqp 1, etc.). Then, the closely related zebrafish aqp genes were assigned to each common carp aqp ortholog and the aqp genes were named after their most closely related zebrafish gene. When more than one copy of a common carp aqp gene was clustered with a certain zebrafish aqp gene, latin numbers suffixes were added to each copy (for instance, aqp 0a1, aqp 0a2, aqp 0b1, aqp 0b2, etc.). The names of each aqp gene in common carp and other surveyed species are listed in Tables 1 and 2.

Syntenic analysis
Syntenic analyses were performed on selected aqp genes across the human, zebrafish, and common carp chromosomes by identifying the positions of aqp neighboring genes. The organization of the genes on the chromosomes of the model species was obtained from the Ensemble databases, while the gene organization of common carp was based on the draft sequences of the common carp genome assembly. Syntenic maps were then drawn based on the gene locations in the surveyed species.

Expression profiling of aqp genes
Total RNA from various adult common carp tissues (brain, heart, spleen, liver, intestine, gill, muscle, and skin) was extracted with TRIzol 1 reagent (Life Technologies, NY, USA). The cDNA, which was used for PCR to examine the aqp expression patterns, was synthesized by RT-PCR using the SuperScript 1 III Synthesis System (Life Technologies). The β-actin gene was used as an internal positive control, with forward primer (5 0 9-TGCAAAGCCGGATTCGCTGG-3 0 9) and reverse primer (5 0 9-AGTTGGTGACAATACCGTGC-3 0 9). The whole PCR process was designed as follows: denaturation step for 5min at 94˚C, 35cycles of denaturation (30 sec at 94˚C), annealing (30 s), the temperature of which differed according various primers, and extension (30s at 72˚C), and a final elongation step of 5 min at 72˚C. The PCR products were separated by gel electrophoresis (1.5% agarose gel at 150 V) in the presence of ethidium bromide and visualized under ultraviolet light.