Construction, De-Novo Assembly and Analysis of Transcriptome for Identification of Reproduction-Related Genes and Pathways from Rohu, Labeo rohita (Hamilton)

Rohu is a leading candidate species for freshwater aquaculture in South-East Asia. Unlike common carp the monsoon breeding habit of rohu restricts its seed production beyond season indicating strong genetic control over spawning. Genetic information is limited in this regard. The problem is exacerbated by the lack of genomic-resources. We identified 182 reproduction-related genes previously by Sanger-sequencing which were less to address the issue of seasonal spawning behaviour of this important carp. Therefore, the present work was taken up to generate transcriptome profile by mRNAseq. 16GB, 72bp paired end (PE) data was generated from the pooled-RNA of twelve-tissues from pre-spawning rohu using IlluminaGA-II-platform. There were 64.97 million high-quality reads producing 62,283 contigs and 88,612 numbers of transcripts using velvet and oases programs, respectively. Gene ontology annotation identified 940 reproduction-related genes consisting of 184 mainly associated with reproduction, 223 related to hormone-activity and receptor-binding, 178 receptor-activity and 355 embryonic-development related-proteins. The important reproduction-relevant pathways found in KEGG analysis were GnRH-signaling, oocyte-meiosis, steroid-biosynthesis, steroid-hormone biosynthesis, progesterone-mediated oocyte-maturation, retinol-metabolism, neuroactive-ligand-receptor interaction, neurotrophin-signaling and photo-transduction. Twenty nine simple sequence repeat containing sequences were also found out of which 12 repeat loci were polymorphic with mean expected-&-observed heterozygosity of 0.471 and 0.983 respectively. Quantitative RT-PCR analyses of 13-known and 6-unknown transcripts revealed differences in expression level between preparatory and post-spawning phase. These transcriptomic sequences have significantly increased the genetic-&-genomic resources for reproduction-research in Labeo rohita.


Introduction
The Indian major carp Labeo rohita (Hamilton), a cyprinid is a leading candidate species for freshwater aquaculture not only in India but also in the whole sub-continent of South-East Asia with the annual production of 1.5 million tonnes in 2012 [1]. Due to its immense economic, ecological and cultural importance it received lot of research interest in different areas including culture [2], breeding [3] immunology [4], disease ecology [5], reproductive physiology [6] and nutrition [7]. One of the major problem in this species is highly monsoon dependent breeding habit [8] and inability to breed in confined pond water without hormonal induction [9]. This restricts seed production beyond the breeding season leading to suboptimal utilization of cultivable water area. Like other Indian teleost the reproductive cycle of L. rohita may be divided into four stages, preparatory period (February-April), pre-spawning period (May-June), spawning period (July-August), and post-spawning period (September-January) [10] and at each stage gonads show discrete change. Unprecedented summer temperature, scanty, irregular and shifted monsoon in recent years has further complicated seed production by affecting one or more reproductive stages. However, common carp (Cyprinus carpio) is from the same cyprinid group, shows prolific breeding in the similar environmental condition. Attempt has been made to study the effects of environmental manipulations on gonadal development [11] through dietary manipulation [12], multiple induced breeding [13], offseason breeding [14] of Indian major carp etc. Although literatures on different aspects of reproduction and breeding are available [3], the genetic mechanism underlying gonad maturation and seasonal breeding in tropical climate has not been fully understood. Genetic studies have been performed in the past decade, which focused on selective breeding [15], development of genetic markers [16], development of linkage map [17] and collection of immune related genes [4]. Reproduction of fish is centrally controlled by brain, pituitary liver and gonad (BPGL axis) [18] and multiple genes are involved in this process. Addressing issues like gonad maturation; spawning and seed production under changing climatic conditions will require comprehensive information on a number of reproduction-related genes. In our earlier attempt 182 reproduction-related genes were identified from 4,642 high-quality ESTs by Sanger sequencing [6], but the number was less. High throughput next generation sequencing technologies provide the platforms to generate transcriptome sequences with much lower cost than traditional Sanger method, thus the transcripts generated by NGS technology can boost genetic and genomic research of relatively lagging species [19]. Until the reference genome sequence becomes available, transcriptome sequencing is a fast and efficient means for gene discovery and genetic marker development. The simple sequence repeat (SSRs) markers are important resources for determining functional genetic variation and among the various molecular markers, SSRs are highly polymorphic [20], and serve as rich resource of diversity. SSRs derived from expressed sequence tags (ESTs) have special advantages such as those might be linked to known genes, having higher transferability among related species, lower cost for development and higher proportion of high-quality markers [20]. Transcriptome resources for reproductive tissues are currently available for other commercially important fishes, including channel catfish (Ictalurus punctatus) [21], common carp (Cyprinus carpio) [19], zebrafish (Danio rerio) [22], rainbow trout (Oncorhynchus mykiss) [23,24], coho salmon (Oncorhynchus kisutch) [25], tilapia (Oreochromis mossambicus) [26], Atlantic halibut (Hippoglossus hippoglossus) [27], senegalese sole (Solea senegalensis) [28], Atlantic salmon (Salmo salar) [29], and cod (Gadus morhua) [30] but less information is available for L rohita [5,6].
Hence, this work was taken up with the primary objective to generate transcripts sequence using mRNA-seq and provide well-assembled transcriptome sequences from the pooled RNA samples of brain, liver, intestine, kidney, tongue, nose, eye, gill, muscle, heart, ovary and testis tissues to identify reproduction relevant genes, verify their association with reproduction by transcript expression pattern and to develop EST-SSR markers within these transcripts of L rohita which may be utilized further in future for genetic diversity analysis, gene mapping, marker-assisted breeding as well as studying reproductive issues in this species.

Ethics Statement
This study was approved by the Animal Ethics Committee of ICAR-CIFA, Bhubaneswar. All the fishes (rohu, Labeo rohita) used in the experiments were handled according to the prescribed guidelines of the Institute.

Maintenance of Animals and tissue collection
The brood stock of rohu as well as common carp was reared in CIFA farm ponds under carp breeding unit following standard procedures [3,31]. The stocking density was maintained at the rate of 1500kg/ha with 1:1 ratio of rohu and common carp (Lat.20°1'06"-20°11'45"N, Long.80°5 0'52"-85°51'35"E). Physico-chemical parameters of water were monitored routinely during the entire course of investigation. The water temperature, pH, D.O., total alkalinity and total hardness were 28-30°C, 7.5-8.5, 100-140 ppm, and 100-130 ppm, respectively. Adult males and females of L rohita (800-1200g) were collected during May-June, (pre-spawning). The fishes were euthanized with MS-222 at 300 mg/L before dissection. Brain, liver, intestine, kidney, tongue, nose, eye, gill, muscle, heart, ovary and testis tissues were collected from minimum five fishes for each tissue, quickly frozen in liquid nitrogen and stored at -80°C, until used for RNA extraction.

Illumina sequencing and quality controls
Total RNA was extracted from 12 different tissues (50-100mg) following the Guanidium Thiocyanate method [32] using the TRIzol-Reagent (Invitrogen, Carlsbad, CA, USA) according to manufacturer's instructions. RNA samples were then treated with RNase free DNase I (Qiagen) to remove potential genomic DNA. RNA integrity was initially checked in 1% denaturing gel. The RNA samples showing clear separation of 28S and 18S bands in the gel and spectrophotometric (Varian Cary 50 Bio) A 260/280 absorption ratio greater than 1.9 were taken for further work. RNA size distribution and integrity were further analyzed in Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) and samples having RIN (RNA Integrity Number) more than 7.0 were used for library preparation. One paired-end (PE) cDNA library was generated from the pooled total RNA (4μg) of rohu tissues in equal quantity using mRNA-Seq assay for transcriptome sequencing on Illumina Genome Analyzer II platform. The library was constructed according to the Illumina TruSeq RNA library protocol outlined in "TruSeq RNA Sample Preparation Guide" (Part # 15008136). The prepared library was quantified using Qubit (Invitrogen) and validated for quality by running an aliquot on High Sensitivity chip (Agilent, Cat # 5067-4626) on Bioanalyzer 2100 (Agilent technologies, Santa Clara, CA, USA). Sequencing was done in one lane to generate 72bp PE reads. The raw sequences generated by Illumina Genome Analyzer were processed with SeqQC-V2.1 (In-house tool kit of Genotypic Technology) for various quality controls, including filtering of high-quality reads based on the score value, removal of reads containing primer/ adaptor sequences and trimming of read length.
rohu. Publicly available program, velvet (version 0.7.62; http://www.ebi.ac.uk/~zerbino/velvet/ ) was used to generate a non-redundant set of transcripts which has been developed for assembly of short reads using de-Bruijn-graph algorithm. Various assembly parameters were also optimized for best result. The trimmed high-quality sequence reads were assembled using velvet program at different (Hash length) k-mer length such as 31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61 and 65 with various output parameters like number of used reads, nodes, total number of contigs, contigs longer than 100 bp, N50 length, longest contig length and average contig length as a function of k-mer. Assembly by velvet was followed by oases (version 0.1.8; http://www.ebi.ac.uk/~zerbino/oases/), which has also been developed for de novo assembly of short reads, which takes the assembly generated by velvet as input and exploits the read sequence and pairing information to obtain better contigs/transcripts particularly to get the isoforms.

Similarity search and functional annotation
Putative function of the velvet assembled contigs was deduced by using them as queries against the UniProtKB/SwissProt database and the non-redundant (nr) protein database in the BLASTX program. The cut off E-value was set at 1e-6 and only the top gene-ID and name were initially assigned to each contig. Gene ontology (GO) annotation analysis was performed in Blast2GO (http://www.blast2go.org/) version 2.5.0 for the assignment of gene ontology terms. The nr BLAST result was imported to Blast2GO. The final annotation file was produced after gene-ID mapping, GO term assignment, annotation augmentation and generic GO-slim process. The annotation result was categorized with respect to Biological Process, Molecular Function, and Cellular Component at level 2. Pathway analyses of unique sequences were carried out based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database using the online (http://www.genome.jp/tools/kaas/) KEGG Automatic Annotation Server (KAAS) by using Bi-directional Best Hit (BBH) method. Enzyme commission (EC) numbers were obtained and used to putatively map protein sequences to a specific biochemical pathway.

Mapping and assessment of sequence reads onto rohu transcripts
To assess transcript and exon distribution in the genome, all the transcripts of oases assembly were mapped to the complete genome of zebrafish (zv9) and all the transcripts with significant hits were plotted by zebrafish chromosome number. For this, putative rohu non redundant transcripts were aligned with the reference sequences of zebrafish genome (zv9) using top hat alignment programme, compared and mapped in digital gene expression file using cufflinks program. In addition, the coverage of each transcript was determined in terms of number of fragment per kilobase of exon per million fragment mapped (FPKM). Further to assess transcript distribution in the genome of other species, the non redundant transcript data set of oases assembly was further subjected to BLAST against the mRNA databases of zebrafish (Danio rerio), salmon (Salmo salar) and catfish (Ictalurus punctatus) separately in NCBI. Those transcripts having more than 70% identity and 50% query coverage (mRNA) were taken for further analysed to get the mRNA annotated transcripts, mRNA overlapping transcripts and list of mRNA matching in that particular species.

Identification of orthologous genes involved in reproduction
To identify the genes and transcripts that play important role in reproduction process, the transcripts associated with GO terms under reproduction (GO: 0000003), hormone activity (GO: 0005179), receptor binding (GO: 0005102), receptor activity (GO: 0004872) and embryonic development (GO: 0009792) were selected.

Identification of EST-SSRs, SNPs and validation of microsatellite containing transcripts
To identify all the simple sequence repeats in assembled transcriptome of L rohita, perl script programme MISA (http://pgrc.ipk-gatersleben.de/misa/) was used. The mono-nucleotide repeats (more than 10 times), di-nucleotide (more than 6 times), tri, tetra, penta-nucleotide (more than 5 times) were considered as search criteria in MISA script. Maximum number of bases interrupting between two SSRs in a compound microsatellite was taken as 100. For the analysis of microsatellite polymorphism, DNA from 3 parents belonging to two linkage mapping panels of rohu [33] and 2 individuals (one each) from resistant and susceptible lines of rohu against aeromoniasis [5] was used to test the gene loci for diversity. Number of alleles, observed heterozygosity (HO), expected heterozygosity (HE) and polymorphic information content (PIC) were estimated using CERVUS software ver 3.0 (http://www.fieldgenetics.com/ pages/home.jsp). The difference between observed and expected heterozygosity was tested using chi square test by SAS 9.2 version for significant deviation. Further, to study the presence of SSRs in other species, corresponding gene sequences from zebrafish as well as common carp were downloaded from the databases and SSRs were searched in these sequences as described for rohu. To identify SNPs in rohu contigs, reads were mapped with the complete genome of zebrafish (zv9) using Bowtie-0.12.7 and variations were detected by Sam tools 0.1.7 with maximum variant count 10.

Putative ORF sequences searching
Unidentified sequences (no match found in BLASTX and BLASTN) among unique sequences were analyzed by star-orf software (http://web.mit.edu/star/orf/runapp.html) using parameter of 80bp minimal ORF length to search for putative ORF proteins, which could be used to distinguish between coding and non-coding sequences. Once the start codon, coding sequences, stop codon and poly (A) tail were identified, the cDNA sequences were considered a full-length cDNA, all the possible frames were searched against the protein database using BLASTP tool of NCBI.
Expression analysis of selected orthologous gene using quantitative real-time PCR during preparatory and post-spawning phases For the quantitative real time study, common carp and rohu were collected from the same earthen pond having same aquatic environment, as mentioned previously in animal maintenance section. Relative expression level of 19 transcripts (13 known and 6 unknown genes with putative ORFs) were measured by real-time PCR in brain, liver, pituitary, ovary and testis tissues collected from 15 individuals during initiation of gonad maturation (preparatory) and resting phase (post-spawning) each, using β-actin as a reference gene (GenBank accession no. EU184877) from rohu. Similarly, brain, liver, pituitary and ovary from common carp (Cyprinus carpio) was also collected from 15 individuals during preparatory phase for comparison with rohu. Equal quantity of tissue from each individual was pooled into three sets (5 fishes in each set) and RNA was extracted for each pooled tissues following the Guanidium Thiocyanate method [32] using the TRIzol-Reagent (Invitrogen, Carlsbad, CA, USA) according to manufacturer's instructions. First strand cDNA was synthesized using M-MLV reverse transcriptase (Finnzymes, Vantaa, Finland). Validation of transcript specific primers (S1 Table) was checked by normal PCR and band intensities for different tissues were observed in agarose gel electrophoresis with β-actin as control (data not shown). The real time PCR amplifications were carried out using a Light Cycler 480 (Roche, Germany) with a negative control with no template.
Real time PCR was repeated twice for each tissue for each sample. The crossing point, Cp values were acquired for both the target and reference gene using software version LCS480 1.5.0.39 of Light Cycler 480 (Roche, Germany). The relative transcript level of each transcript in each tissue was calculated by normalization of the value with the corresponding reference and compared among them using Cp values for brain cDNA as positive calibrator [34]. Comparison of relative expression level of each transcript between the two reproductive phases in individual tissue as well as between the two species was analyzed in REST 2009 software and the whisker-box plots were extracted with 2000 time iterations (http://www.REST.de.com).

De novo assembly
A total of 74,725,656 PE (37,362,828 from each end) raw sequence reads with each 72 bp length were generated using Illumina Genome Analyzer II, encompassing about 16 GB of sequence data in fastq format. The raw reads produced have been deposited in the NCBI SRA database (accession number: SRA051586). After filtering the sequence data for low-quality reads at higher stringency and reads containing primer/adaptor sequence, resulted in a total of 64,971,614 (87%) high-quality sequence reads (more than 70% of bases in a read with more than 20 phred score). The final data set comprising 64.97 million high-quality reads was used for optimization of de novo assembly and analysis of rohu transcriptome (Table 1). N50 length of the contigs generated using velvet assembly program varied from 454 to 1309 of different k mer (31-65) values (Fig 1). Out of these the best k mer value found was 37, as it resulted in highest N50 length of 1309 bp, largest contig length of 16,961 bp and largest average contig length of 709 bp. The assembly resulted in a total of 62,283 contigs containing 8 contigs of 10 Kb, 13,707 contigs of 1 Kb, 26,145 contigs of 500 bp with a minimum of 100 bp lengths ( Table 2). The total number of reads used for the assembly was also highest (69.61%) for k mer value of 37. The assembly of contigs generated by velvet at k-mer 37 was used again as input data in oases with default parameters. This resulted in a total number of 88,612 transcripts in comparison to the 62,283 contigs resulted from velvet.

Functional annotation and identification of reproduction-related genes
A total of 31,637 contigs had significant BLASTX hit corresponding to 17,925 unique protein accessions in the nr protein database. Gene ontology (GO) analysis of these 17,925 unique proteins resulted in a total of 78,317 annotations/GO terms including 33,343 (42.57%) biological process terms, 23,479 (29.98%) molecular function terms and 21,495 (27.44%) cellular component terms (Fig 2). Among the biological process category 7,841 and 7,375 genes were related to metabolic (GO: 0008152) and cellular processes (GO: 0009987) respectively; a significant numbers of genes were also identified from development process (2,534) and growth (248) sub-categories. Similarly under molecular function category, 11,798 genes were involved in the binding process (GO: 0005488) and 6,784 genes in the catalytic activity (GO: 0003824); whereas under the cellular component category, 11,533 genes from cell (GO: 0005623), and 6,500 genes corresponded to organelle (GO: 0043226) were the most represented categories. A total of 940 reproduction relevant genes were identified (Fig 3), among which 184 were mainly associated with reproduction related proteins (GO: 0000003), 223 related to hormone activity (GO: 0005179) and receptor binding related proteins (GO: 0005102), 178 receptor activity related proteins (GO: 0004872) and 355 embryonic development related proteins (GO: 0009792). Details of 940 reproduction related gene orthologues are given in S2 Table. Mapping of rohu transcripts with zebra fish, salmon, and catfish   23 showed maximum number of match with rohu transcripts (1998 and 1487) and exons (5166 and 4267) respectively (Fig 4). On the other hand, out of 88,612 oases based rohu transcripts when compared with zebra fish, salmon, and catfish mRNA showed match with 31,091, 4,894 and 1,071 numbers of annotated transcripts respectively (Table 3). Interestingly it also showed 794 transcripts common in all these species (S1 Fig).  KEGG-pathway analysis for identification of reproduction related pathways KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis was performed on all assembled contigs as alternative approach for functional categorization and annotation. Several KEGG pathways were represented by more than 200 unique transcripts. Enzyme commission (EC) numbers were assigned with 5,938 enzyme codes for 8,683 unique sequences (Table 4). Briefly, 2,269 (26.13%) were classified into the metabolism, 1,365 (15.72%) sequences grouped into the Genetic information processing (GIP), 1,188 (13.68%) unique sequences come under Environmental information processing (EIP), 1,283 (14.77%) unique sequences under cellular processes and finally 2,578 (29.69%) sequences under organismal systems had match with KEGG annotation (Table 4). Among reproduction-relevant pathways, GnRH signalling pathway (

Identification of EST-SSRs and SNPs
A total of 22,383 EST-SSRs were identified in 17,244 transcripts of rohu with a frequency of one SSR per 3.41 kb of the sequence ( Table 5). The mono-nucleotide repeats represented the largest fraction (60.33%) of SSRs identified followed by di-nucleotide (19.21%) and tri-nucleotide (9.83%) repeats. Only a small fraction of tetra-(390), penta-(70) and hexa-nucleotide (23) repeats were identified in rohu transcripts (Table 5 and S2 Fig). In total, there were 1893 compound repeats. Screening of microsatellites from reproduction relevant known genes revealed 29 different microsatellite containing sequences with different repeat motifs (Table 6). From 29 reproduction relevant microsatellite containing sequences 51 primers were designed from the flanking region, out of which 38 primers were selected for testing, but 32 (84%) primers showed PCR amplification. Checking further with these 32 loci in the mapping panel parents, twenty loci were found to be monomorphic and twelve polymorphic. The genetic diversity measures   using CERVUS indicated mean expected and observed heterozygosity to be 0.471 and 0.983, respectively. Mean polymorphism information content was found to be 0.35 (Table 7). The observed heterozygosity was lower for the gene loci, SP02, SP04 and SP12, while it was higher in gene loci SP08, SP09, SP14, SP15, SP20 and SP21 than the expected. The difference between expected and observed heterozygosity was not significant (P>0.05). No linkage was detected among the loci. The difference between mean expected and mean observed heterozygosity was not significant (P>0.05). Observed and expected heterozygosity differed at individual locus level (Table 7). Corresponding to 29 microsatellite containing sequences in rohu 26 gene sequences were found for zebrafish and the matching percentage was greater (74%-98%) at nucleotide level, as compared to only 3 sequences found for common carp with lower (10%-91%) identity. Microsatellites could be observed in six zebrafish sequences but not in the common carp sequences (S3 Table). A total of 52925 SNPs were identified in 19402 transcripts of rohu (Table 8), out of which 1827 were homozygous and 51098 heterozygous (S4 Table).

Discussion
Sequencing, assembly and mapping of Labeo rohita transcriptome Sequencing and characterization of transcriptome of non model species using RNA-seq is one of the most important applications of NGS technologies [35]. The de novo assembly of short reads without a known reference is considered difficult [36]. In the present study different kmer results suggested that k-mer length affects inversely to the number of contigs. Out of these the best k-mer value obtained was 37, as it resulted in highest N50 length of 1309 bp, with   largest contig length of 16961 bp and average contig length of 709 bp (Fig 1). Similar findings were reported in de-novo assembly of Chickpea transcriptome [36]. 62,283 numbers of contigs with average length of 709bp as compared to 40,596 numbers of contigs with average contig length of 308bp from rohu [5], indicated better quality of reads in the present study. Out of 62,283 contigs, 50.79% (31,637) provided significant BLASTX hit corresponding to 17,925 unique protein accessions in the nr protein database, however, 47.4% contigs showing annotation from rohu in previous study [5]. It also revealed that, about 50% of transcripts did not match in any databases thus are less likely to cover coding regions, or belongs to novel sequences [37]. BLASTX top-hit species distribution of gene annotations showed highest homology with the Danio rerio, followed by Oreochromis niloticus, and Cyprinus carpio with L rohita and lesser homology with Salmo salar, Ctenopharyngodon idella, Carassius auratus and Ictalurus punctatus (Data not shown). Probably, these results indicated a high level of similarities and conserveness of the L rohita gene content with Danio rerio and Cyprinus carpio than with Salmo salar and Ictalurus punctatus. However, higher homology of L rohita (cyprinidae) sequences with Oreochromis niloticus (cichlidae) as compared to Ctenopharyngodon idella and Carassius auratus (cyprinidae) may be explained on the basis of the fewer number of genes that are currently available in the NCBI database for these species as was also reported in rainbow trout [37]. Blast analysis of rohu transcripts against the mRNA databases showed maximum (31,091) match with zebrafish, followed by salmon (4,894) and catfish (1,071) which further indicated closer relation of rohu with zebrafish (S1 Fig). 27,693 contigs in common carp were found significantly matching with refseq proteins of zebrafish [22]. Based on the above results rohu transcripts were chosen for chromosome wise mapping with complete genome of zebrafish (zv9), which showed that rohu transcripts were distributed in all 25 chromosomes of zebrafish (Fig 4), indicating 25 numbers of chromosomes are also present in rohu [38]. Among them, chromosome 5 and 23 of zebrafish (zv9) showed maximum similarity with rohu transcripts while it was with chromosome 7 and 5 of zebrafish (zv9) in case of common carp transcripts [19]. Out of 17,925 gene orthologues found, a total of 940 important reproduction-relevant genes under reproduction, hormone activity, receptor binding, receptor activity and embryonic development were analyzed and reported for the first time in L rohita from the pre-spawning phase. These transcripts are important because ovarian recrudescence, responsiveness and stimulatory effect to both steroidogenic and gametogenic functions of the gonad normally occurs during the pre-spawning period [10]. Similarly, 2852 genes involved in maturation and development of the ovary from rainbow trout [21], 474 genes from ovary of tilapia [23], 2341 genes from atlantic halibut related to quality of egg parameters [25], 1200 genes from rainbow trout and clawed toad during late oogenesis [21], 275 genes in coho salmon from ovary during primary and early secondary oocyte growth [23] were also reported.

Analysis of reproduction relevant pathways in Labeo rohita
Transcriptome studies help in gene discovery and provide novel insight into various unique species-specific biological process/pathways [36]. The 8,683 sequences (Table 4) along with 328 different enzymes/orthologues representing nine important reproduction-relevant pathways obtained in KEGG analysis may serve as valuable resources for future gene identification and functional analysis, as well as development of microarray for reproduction research for this species. Among different pathways mapped, GnRH signaling is the major reproductionrelevant pathways identified. Reproduction in fishes is dependent on the coordinate actions of various hormones in which gonadotropin-releasing hormone (GnRH), acts as master regulator via the hypothalamic-pituitary-gonadal (HPG) axis [39]. Therefore GnRH signaling pathway study in this species will be of key interest in the context of its seasonal (monsoon) nature of breeding under tropical climate in comparison to prolific breeder like zebrafish [40]. Out of 121 genes mentioned in this pathway in zebrafish, about 40 genes are captured in L rohita, which is quite significant and not reported earlier.
Generally, immature oocytes are developed into fertilizable eggs through meiotic maturation induced by specific hormones [41]. While progesterone-mediated oocyte maturation is the major studied pathway in xenopus, it can also be induced by other steroid hormones and the pathway may differ from animal to animal [42]. 46 out of 107 genes of progesterone-mediated oocyte maturation pathway, and 53 out of 137 genes in oocyte meiosis pathway were mapped in rohu in the present study.
Steroid biosynthesis and steroid hormone biosynthesis pathways are important for fish reproduction. The steroid hormones are all derived from cholesterol [43]. Numerous organs are known to have the capacity to synthesize biologically active steroids, including the adrenal gland, testis, ovary, brain, placenta, and adipose tissue. 14 out of 28 genes from steroid biosynthesis pathway and 15 out of 38 genes from steroid hormone biosynthesis pathway of zebrafish were reported for the first time in rohu, which are of immense value for future reproductive biology study.
In retinol metabolism pathways, retinal is the predominant retinoid in eggs and oocytes of marine fish as well as some freshwater fish species and may constitute almost the entire pool of retinoids in these eggs [44]. The cellular pathways for retinoid metabolism are mainly known from studies in mammalian species. These pathways are highly conserved and principally identical among all classes of vertebrates, probably also in fish [44]. Out of 38 genes from retinol metabolism pathways of zebrafish, 17 genes were observed in L rohita.
Similarly neurotrophins are growth factors implicated in the development and maintenance of different neuronal populations in the nervous system [45]. No neurotrophin signaling pathways are reported for zebrafish in KEGG database; however, in the present study, 67 genes involved in neurotrophin signaling pathways for rohu were found.
Photoperiodic manipulation has emerged as an effective tool of reproductive management in culture fisheries, and understanding the physiology of photoperiodic regulation of fish reproduction became the priority topic of research in different countries [11]. So study of photo-transduction pathways for this species is of immense value. Recently, significant advancement of gonad maturation of Indian major carp species (rohu, catla and mrigal) was possible through photothermal manipulation [14]. For the photo-transduction pathway in rohu, out of 38 genes, 13 genes were mapped from KEGG pathway databases.

Expression analysis of reproduction relevant genes by real time PCR
Some of the important gene orthologues i.e, vitellogenin receptor, insulin receptor b, fgg protein, green sensitive cone opsin, steroid receptor homolog svp 46, prolactin, activin receptor, spermatogenic glyceraldehyde-3-phosphate dehydrogenase, semaphorin 3fa, follistatin-like 2, cathepsin-Z and estrogen receptor binding site associated antigen 9 variant 1, were analyzed by qPCR in preparatory and post-spawning period. All these transcripts were collected during pre-spawning phase, but preparatory and post-spawning phases are equally important to know the events in initiation of gonad maturation in Indian carps [14] and accumulation of any transcript in these stages indicate its role in maturation.
Vitellogenin synthesized and released from the liver, is carried through blood and taken up into oocytes by the vitellogenin receptor; which is an essential process in oviparous animals to ensure successful reproduction [46]. Although different isoform of vitellogenins [47] and their role in gonad maturation [48] are found in the literature, nothing is reported about vitellogenin receptor and their role. Expression of vitellogenin receptor was down regulated in most of the tissues studied (except in testis) in preparatory phase as compared to post-spawning in L rohita. Amounts of vitellogenin receptor were less in individuals with yolked oocytes (ripening stage, May-June) and increased after spawning in July in Atlantic bluefin tuna (Thunnus thynnus L.) [49]. Generally Insulin is implicated in growth, development and reproduction in teleosts [50] and expression of different insulin receptor genes were studied [49]. In L rohita expression of insulin receptor b was found down regulated particularly in liver and almost no change in other tissues during preparatory phase as compared to post-spawning, whereas high level of Insulin receptor b mRNA was reported in ovary in rainbow trout [51]. The role of fibrinogen-γ (FGG) in uterine epithelial cells during normal pregnancy, pseudo-pregnancy and in hormone-treated rats is suggested [52]. Fibrinogen gamma chain expression during preparatory phase was significantly lower in brain and testis than in post-spawning phase and no variation observed in other tissues in rohu. However, in zebra fish higher expression was observed in the embryonic yolk syncytial layer than in the early cells of the developing liver [53]. Photoreceptors mainly consist of rods and cones. In the retina of diurnal primates, cones are further subdivided into three subtypes, the red-, green-, and blue-sensitive cones, whose visual pigments are maximally sensitive to long, middle, and short wavelengths, respectively [54]. Significantly lower expression of green sensitive cone opsin was observed in brain, liver and testis during preparatory phase in L rohita, where as expression of Pi-green1 and Pi-green2 cone opsins were observed in skin and lateral eyes in Paracheirodon innesi [55]. Steroid receptor homolog svp 46 showed significantly reduced expression in brain, liver and testis while spermatogenic glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and Semaphorin 3fa both were down regulated in brain, liver and pituitary during preparatory phase in L rohita. In human GAPDH protein was expressed in both sertoli cells and elongated sperms [56], whereas high expression levels of semaphorin 3fa domains were observed in human oocyte from the earliest follicle stages [57] but no report was available about their expression level in any fish species. Expression level of follistatin-like 2 was significantly lower in ovary during preparatory phase, whereas in xenopus it is reported as an early gastrula expressed gene [58]. Cathepsin-Z level was significantly higher in ovary in post-spawning rohu while abundant cathepsin-Z expression was observed in rainbow trout in low quality unfertilized eggs [59]. In most vertebrates, 11-betahydroxysteroid dehydrogenase b2 is essential for conferring aldosterone-specific actions in mineralocorticoid target tissues and for protecting glucocorticoid-sensitive tissues during stress [44]. Expression level of 11-beta-hydroxysteroid dehydrogenase was significantly less in brain and liver during preparatory phase in rohu; whereas expression of this transcript was observed nearly in all peripheral tissues in zebrafish [44]. Prolactin plays important roles in freshwater fish reproduction [60] and seasonal acclimatization [61]. Expression level of prolactin was significantly lower in brain, liver and testis during preparatory phase compared to post-spawning in L rohita. However in Cyprinus carpió higher level of mRNA expression was noticed in summer carp pituitary, as compared to winter carp [61]. Activins are critical components of the signaling network that controls female reproduction and different receptors control their roles and functions in hypothalamus [62]. Activin receptor level was significantly low in brain, liver and pituitary during preparatory phase of L rohita, whereas in Ctenopharyngodon idella activin receptor transcripts shows high expression levels in extra-gonadal tissues, including pituitary, brain, and liver [63]. Report of mRNA encoding estrogen receptor binding site associated antigen 9 variant 1 is present in mammalian oocytes [64], but nothing is known about either expression pattern or function in oocytes during maturation, fertilization, and subsequent embryonic development. A significantly higher expression level of this transcript was found in ovary and testis in preparatory rohu in the present study.
Almost all the unknown transcripts were down regulated in brain, liver and ovary (Fig 7), and node-19676, node-20067 and node-20271 were up-regulated in testes during preparatory phase. Many unknown transcripts or novel sequences (34.7%) were also reported during transcriptomic analyses in zebrafish gonad and brain [65]. Searching of similar sequences for the unknown transcripts in the present database of common carp was not successful, and always showing only a few bases matching both at BLASTN, tBLASTN as well as in BLASTX and BLASTP as was reported in our previous study [6], but expression of all these transcripts are also observed in these prolific breeders and significantly higher expression (p<0.001) of node-7314 was noticed in common carp pituitary. Level of expression of almost all the 6 unknown transcripts were significantly higher in seasonal breeder rohu in comparison to carp which indicates that there may be some possible role of these transcripts in reproduction of rohu as was reported in our previous study also [6].

Development of EST-SSR markers and SNPs and validation of EST-SSR markers
Identification of large number of SSRs in the L rohita transcripts with frequency of one SSR per 3.41 kb of the sequence is very interesting and will enrich the existing marker resources of rohu to facilitate genetic improvement of this species.
Twelve microsatellite loci associated with the genes involved in reproductive process showed polymorphism in the present study out of 29 identified, while twenty repeat loci were reported to be polymorphic from 128 loci in our previous study [6]. The higher percentage of polymorphic loci in this study could be due to the high polymorphism between mapping parents. However, a high attrition rate of potential microsatellite marker is generally observed in case of EST-SSRs, starting from primer design step following identification of a simple sequence repeat, through agarose gel analysis, to polyacrylamide gel optimization and final analysis. The relatively low success rate of primers showing good amplification may be due at least in part to high intra-specific polymorphism in the L rohita genome, as was seen in the analysis of flanking region sequences in this study. Similarly Cameron et al., [66] attributed variation in amplification efficiency of sea urchin microsatellite loci to a high level of genomic polymorphism.
Presence of repeats in some of the corresponding zebrafish sequences indicated that SSR may be present in these sequences across the species although it could not be confirmed in common carp due to lack of sequences in databases. A total of 52925 SNPs including 1827 homozygous and 51098 heterozygous were identified in rohu by comparing with zebrafish (zv9) genome, as no reference sequences are reported for rohu in databases. SNPs were reported for disease resistant and susceptible lines of rohu [17] but the SNPs detected in the present study need further validation.

Identification of Isoforms
It has been suggested that assembly of velvet followed by oases yields better contigs/transcripts to produce transcript isoforms [36]. Production of 88, 612 transcripts by oases analysis in comparison to the 62,283 contigs resulted in velvet, suggest that rohu transcript showing isoforms. Observation of gene isoforms from reproduction related transcripts in L rohita is an interesting finding. Seven isoforms of activin receptors were found in rohu, while six variants were observed in grass carp [63]. TATA box binding proteins have two isoforms in rohu; similar splice variant was observed in human, encoding the polyglutamine-containing N-terminal domain that accumulates in Alzheimer's disease [67]. However no isoforms for these proteins were reported in any other fish species. Transferrin receptor showed two isoforms in rohu, which is quite similar with other vertebrates [68]. Thyroid hormone receptor showed seven isoforms in rohu while two thyroid hormone receptor-α genes (thraa, thrab) were found in zebrafish [69]. Seven isoforms of progesterone receptor membrane component was observed in rohu in contrast to three forms reported in channel catfish [70]. Six, four and three isoforms are observed for Prostaglandin synthase, Beta-galactosyltransferase and Semaphorin respectively in rohu, but no isoforms are reported in other fish species. Three isoforms were found for retinoic acid receptor in rohu as compared to seven isoforms in zebrafish [71]. Six isoforms for estrogen-related receptor and two isoforms for cadherin were found in rohu, number of which varies in mammals [72] and zebrafish [73]. Present study revealed six isoforms of insulin receptor in rohu, whereas zebrafish expressed two isoforms of it (insra and insrb) [74]. Nuclear receptors are a class of proteins found within cells that are responsible for sensing steroid and thyroid hormones and certain other molecules and in rohu twelve isoforms of nuclear receptor were observed. Talin showed two isoforms in rohu, where as talin-1 and talin-2 in model vertebrates produces two talins through alternative mRNA splicing [75]. DNA methyltransferase showed two isoforms both in rohu and zebrafish [76]. It is necessary to mention here that these isoforms in rohu are the product of next generation sequencing in which the sequences were generated as short read 75bp sequence followed by assembly of them by the software. Therefore validity of all these isoforms are highly essential for final conclusion.

Conclusion
Production of 62,283 high-quality L rohita transcriptome derived from brain, pituitary, liver, intestine, kidney, tongue, nose, eye, gill, muscle, heart ovary and testis tissues from pre-spawning phase will contribute a significant non-redundant set of ESTs resources. Out of 17,925 important gene orthologues found, a total of 940 reproduction-relevant genes were analyzed and reported for the first time in L rohita. In KEGG analysis, 8,683 well-categorized, annotated transcriptome along with 328 different enzymes/orthologues representing nine important reproduction-relevant pathways were obtained. A total of 22,383 SSRs were identified in 17,244 transcripts of rohu and 12 polymorphic loci were identified from 29 reproduction related genes. Difference in tissue expression levels of 13 known genes and 6 unknown putative genes indicates variation between preparatory and post-spawning phase of these transcripts in monsoon breeder carp L rohita. Isoforms for several reproduction related gene transcripts in L rohita is an interesting finding. These data may serve as important and valuable resources for L rohita genetics and genomics which will be beneficial as a reference set for the production of large-scale transcriptome study in future as well as for gene identification and functional analysis for rohu reproduction.