Comparative Transcriptome Analysis Using High Papaverine Mutant of Papaver somniferum Reveals Pathway and Uncharacterized Steps of Papaverine Biosynthesis

The benzylisoquinoline alkaloid papaverine, synthesized in low amount in most of the opium poppy varieties of Papaver somniferum, is used as a vasodilator muscle relaxant and antispasmodic. Papaverine biosynthesis remains controversial as two different routes utilizing either (S)-coclaurine or (S)-reticuline have been proposed with uncharacterized intermediate steps. In an attempt to elucidate papaverine biosynthesis and identify putative genes involved in uncharacterized steps, we carried out comparative transcriptome analysis of high papaverine mutant (pap1) and normal cultivar (BR086) of P. somniferum. This natural mutant synthesizes more than 12-fold papaverine in comparison to BR086. We established more than 238 Mb transcriptome data separately for pap1 and BR086. Assembly of reads generated 127,342 and 106,128 unigenes in pap1 and BR086, respectively. Digital gene expression analysis of transcriptomes revealed 3,336 differentially expressing unigenes. Enhanced expression of (S)-norcoclaurine-6-O-methyltransferase (6OMT), (S)-3′-hydroxy-N-methylcoclaurine 4′-O-methyltransferase (4′OMT), norreticuline 7-O-methyltransferase (N7OMT) and down-regulation of reticuline 7-O-methyltransferase (7OMT) in pap1 in comparison to BR086 suggest (S)-coclaurine as the route for papaverine biosynthesis. We also identified several methyltransferases and dehydrogenases with enhanced expression in pap1 in comparison to BR086. Our analysis using natural mutant, pap1, concludes that (S)-coclaurine is the branch-point intermediate and preferred route for papaverine biosynthesis. Differentially expressing methyltransferases and dehydrogenases identified in this study will help in elucidating complete biosynthetic pathway of papaverine. The information generated will be helpful in developing strategies for enhanced biosynthesis of papaverine through biotechnological approaches.


Introduction
Benzylisoquinoline alkaloids (BIAs) form one of the largest and most diverse groups of alkaloids with about 2500 members. Opium poppy (Papaver somniferum) considered as the only source for several high-value pharmaceutically important BIAs including the narcotic analgesic morphine, antitussive codeine, the potential anti-cancer drug noscapine, the antimicrobial agent sanguinarine and the vasodilator papaverine. The papaverine, synthesized in low amounts in most of the opium poppy varieties, is also used as smooth muscle relaxant [1], erectile dysfunction [2], treatment of intestinal and urinary tract spasms [3], migraine headaches [4] and schizophrenia [5]. All these effects are attributed to its inhibitory effect on phosphodiesterases by increasing cAMP levels [5], [6].
In case of papaverine biosynthesis pathway, very limited and controversial information is available. Feeding experiments suggested that (S)-reticuline and nororientaline as well as norreticuline to be possible pathway intermediates [19], [20]. A few putative pathways have been proposed with these molecules as intermediates. The first proposed pathway, NH pathway (involving N-desmethyl intermediates), for papaverine biosynthesis utilizes (S)-coclaurine with the involvement of 39-hydroxylase, and norreticuline 7-O-methyltransferase (N7OMT) [21]. The second proposed pathway, NCH 3 (involving several N-methylated intermediates), pathway begins with the conversion of (S)-reticuline to (S)-laudanine catalyzed by reticuline 7-O-methyltransferase (7OMT) [22]. As a result of subsequent 39-O-methylation, Ndemethylation and dehydrogenation papaverine is synthesized [23]. These subsequent 39-O-methylation, N-demethylation and dehydrogenation steps are essential in both routes to yield papaverine. Recently, through transcriptome sequencing and virus-induced gene silencing (VIGS) it has been suggested that intermediate steps from both previously reported pathways operate for papaverine biosynthesis [24]. Contrary to this, the same group reported that the major pathway to papaverine biosynthesis involves N-desmethyl intermediates and does not primarily proceed via (S)-reticuline [25]. Taken together, these studies suggest that papaverine biosynthesis still remains controversial more than a century after the pathway was first suggested.
The major limitation in understanding papaverine biosynthesis and other BIAs is due to limited genomic information about this plant. In different plants, two strategies, one to establish complete genome and other to generate ESTs have been helpful in understanding pathways and networks leading to biosynthesis of primary and secondary metabolites as well as plants behavior to different stresses. At present, total genome sequencing of poppy plant has not been initiated due to large genome size (7.4 Gbp) [26]. Initially, EST datasets have been utilized for gene discovery and to elucidate alkaloid biosynthesis pathway to some extent [11], [17]. Use of Next Generation Sequencing (NGS) technologies led to the establishment of transcriptomes of few poppy cultivars with differential BIA content and discovery of a number of previously unidentified genes [25], [27]. Recently, analysis of mutants for elucidation of biosynthesis pathways for specific alkaloids had been emphasized in addition to stepwise verification [18], [28]. Using this approach, metabolic block in top1 (thebaine oripavine poppy 1) mutant, which is defective in 6-O-demethylation of thebaine and oripavine [29], has helped in identifying genes involved in thebaine biosynthesis [17]. Studies also suggest that mutants selectivity allow exploration of targeted redundant pathway branches and may enable enhanced understanding for efficient production of high value compound using biotechnological approaches [18].
Here, we have characterized pap1 (naturally selected high papaverine mutant 1) mutant of opium poppy, which accumulates papaverine up to 6% of the latex with visible phenotype of white latex in comparison to normal cultivar (BR086) with 0-0.5% papaverine of normal pink latex. We established complete transcriptome of pap1 and BR086 using high-throughput 454 Genome Sequencer (GS) FLX platform with an objective to identify genes involved in the biosynthesis of papaverine. We report a number of differentially expressing unidentified transcripts between pap1 and BR086. Using this data we have identified members of methyltransferase (MT) and dehydrogenase gene families. Expression of differentially expressed genes has been validated through Real-Time PCR. On the basis of identified genes, we propose that identified genes may be potential candidates for elucidating papaverine biosynthesis in P. somniferum.

Materials and Methods
Plant material, cDNA library construction and sequencing Peduncle tissue (stem, 2 cm below the capsule) of prehook stage of pap1 mutant and BR086 cultivar of Papaver somniferum L. were collected from four plants grown in the experimental plot of the Institute. Frozen tissues were ground to a fine powder in liquid nitrogen and total RNA was extracted using Spectrum Plant Total RNA Kit (Sigma-Aldrich, USA) and treated with RNase free DNaseI (Ambion, USA) according to manufacturer's instructions. The quality and quantity of total RNA were analyzed by agarose gel and spectrophotometric analysis (ND-1000 Nanodrop, Nano-Drop Technologies, USA). The equal amount of total RNA from four different preparations was pooled and used for further processing for transcriptome sequencing. Double-stranded (ds) cDNA library was prepared using pooled total RNA and double stranded cDNA synthesis kit (Invitrogen, Carlsbad, CA) as per manufactures recommendations. Quantity as well as quality of (ds) cDNA library was checked on Agilent 2100 Bioanalyzer DNA chip (Agilent Technologies Inc., Santa Clara, CA). Random fragments of about 250-800 bp in length of (ds) cDNA were generated by nebulization and purified using QIAquick PCR purification spin columns (Qiagen, USA). Adapter ligated (ds) cDNA libraries were prepared using fragments above 300 bp according to manufacturer's instruction (Roche, USA). (ds) cDNA fragments were denatured to generate single-stranded cDNA fragments, which were amplified by emulsion PCR for further sequencing according to manufacturer's instructions (Roche, USA). The sequencing of cDNA libraries was performed on a 454-GS FLX sequencing platform (454 Life Sciences, Roche, USA) using GS FLX Titanium Kit.

De novo assembly and sequence annotation
Reads from pap1 and BR086 libraries were processed and trimmed for weak signals by GS FLX pyrosequencing software to yield high-quality (HQ) ($99.5% accuracy of single-base reads) sequences. The HQ reads were assembled using Roche GS Assembler (version 2.5.3) with 40 base pair overlap and 95% identity for independent libraries forming contigs and singletons. The contigs and singletons of both libraries (pap1 and BR086) were tagged and assembled together for the purpose of quantifying differential expression of unigenes. Total unigenes formed in different libraries were pooled and annotated using standalone version of BLASTx program against protein Nr database (http://www.ncbi.nlm.nih.gov; released on 06/23/ 2009), Arabidopsis protein database at The Arabidopsis Information Resource (TAIR; http://www.arabidopsis.org; version Tair9) and NCBI-CDD (Conserve Domain Database) database with an E-value cut-off of 10 25 and extracting only the top hit for each sequence.

Functional characterization and biological pathways assignment
To assign function to each unigene, gene ontology (GO) analysis was performed using GO annotation in online TAIR Database (http://www.arabidopsis.org/), which classified unigenes under the categories of Cellular component, Molecular Function and Biological Process. The TAIR IDs of all the unigenes (contigs and singletons) from pap1 and BR086 were retrieved from TAIR9 annotation. Each annotated sequence may have more than one GO term, either assigned in the different GO categories or in the same category [30]. To gain an overview of gene pathway networks, the assignment of unigenes from combined transcriptome into metabolic pathways were mapped according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) [31]. KO numbers were assigned to unique sequences, based on the BLASTx search of protein databases, using a cutoff E value 10 25 . The output of KEGG analysis includes KEGG orthology (KO) assignments and KEGG pathways (http://www.genome.jp/ kegg/) that are populated with the KO assignments.

Digital gene expression profiling
Relative gene expression level for different unigenes was calculated using number of reads per unigenes in combined assembly of pap1 and BR086 transcriptome. To carry out this, the contigs and singletons of pap1 and BR086 libraries were tagged, assembled together and annotated. Transcripts per million (tpm) were calculated for each contig formed for further analysis.
Relative expression values of unigenes in pap1 and BR086 samples from combined assembly were computed by dividing its 454-sequence reads by the total 454-sequence reads in both the samples. Differential expression of genes was calculated by using R value and fold change. To calculate the threshold R value, 1000 datasets for each library was generated according to the random poisson distribution as described in [32]. Log 2 converted values for each unigene was subjected in MeV (Multiple Experiment Viewer, Java tool for genomic data analysis, version 4.8.1). R value $8 and fold change $2 for a unigene was considered as differential expression. Hierarchical clustering (HCl) of log-transformed expression data was carried out using Euclidean Distance Matrix and Average linkage clustering method [33].

Real-time PCR analysis
Expression of putative genes involved in papaverine biosynthesis was analyzed through qRT-PCR using Real-Time PCR Detection System (ABI 7500, Applied Biosystems, USA) and fast SYBR Green PCR Master Mix (Applied Biosystems, Warrington, UK) to validate 454 sequencing data. Each PCR reaction was set up in total 20 ml volume containing 10 ml of Fast SYBR green master mix, 10 ng of cDNA sample prepared using RevertAid H minus first strand cDNA synthesis Kit (Fermantas, Life Sciences, Ontario, Canada) and gene-specific primers (Table S1). The PCR cycling conditions were as follows: 50uC for 2 min, 95uC for 2 min for initial denaturation followed by 40 cycles of 95uC for 15 s and 60uC for 60 s. Each time, reaction sets also incorporated a no-template control. We employed probes specific for the actin as references. The qPCR data was analysed with the DDCT method using reference gene. For each sample, the mRNA levels of target genes were normalized to that of the actin mRNA. All the experiments were repeated using three biological replicates and the data were analyzed statistically (6 Standard Deviation).

Alkaloid content estimation
Opium latex from capsules from field matured plants of pap1 and BR086 were collected and stored in DMSO for alkaloid extraction. Solvent containing extracted alkaloids was recovered and estimation of various alkaloids was carried out through separation using High Pressure Liquid Chromatography following the standard method [34], [35]. The average content of each alkaloid of at least three samples was calculated and data are expressed as mean +/2standard deviation (S.D.).

Results and Discussion
Enhanced papaverine biosynthesis in pap1 mutant Distinct phenotypes with respect to colour of fresh latex was observed upon lancing capsules of pap1 mutant in comparison to BR086 of P.somnifera. In contrast to pink latex produced in BR086, pap1 exuded white latex ( Figure 1a). Earlier, Khanna and Shukla (1991) [36] studied inheritance of papaverine content in selected opium poppy lines and established positive correlation between white fresh latex as morphological marker and high papaverine content. Fresh white latex exudation and papaverine content has also been suggested to be correlated with gene action through crossing of high papaverine line with BR086 line and analysis of six subsequent generations [37]. As naturally selected pap1 mutant exudes white latex on lancing, analysis of major morphinans (thebaine, codeine and morphine) and papaverine was carried out in pap1 mutant and BR086 (Figure 1b). No significant difference in thebaine, codeine and morphine content in latex was observed between pap1 and BR086. However, papaverine content was more than 12-fold higher (6% of the dried latex) in pap1 in comparison to BR086 line (0.48% of the latex). As pap1 mutant accumulated significantly high papaverine content without changes in accumulation in morphinans, we hypothesized that this may serve as good model to explore and elucidate papaverine biosynthesis pathway.
It has been suggested that early stages of BIA biosynthesis in P. somniferum occur in parenchyma cells associated with the vascular bundle and surrounding laticifers in aerial plant parts. After synthesis alkaloids accumulate into laticifers of capsule [38]. In addition to capsule, stem tissue that lies within 2-4 cm of the capsule is also a rich source of latex containing alkaloids. Immunolocalization of the alkaloid biosynthetic enzymes and major latex protein (MLP) from this area of stem suggested that complete machinery involved in the biosynthesis of alkaloids resides in this tissue [38]. Therefore, we used stem tissue below 2 cm of capsule (peduncle) for further transcriptomic analysis. Other studies have also utilized same tissue to carryout transcriptome analysis of different poppy varieties [29], [39]. Transcriptome sequencing and sequence assembly cDNA libraries from mRNA from peduncle tissue of pap1 and BR086 of P. somniferum were sequenced using half plate run for each on a 454-GS FLX Titanium platform. After initial quality filtering through trimming adaptor sequences and removing reads shorter than 50 bp, sequencing run yielded 770,126 (238 Mb) and 763,579 (238 Mb) high-quality (HQ) expressed sequence tags for pap1 and BR086 line respectively ( Table 1). The raw sequences files SRR403064 (pap1) and SRR403061 (BR086) were deposited at NCBIs Short Read Archive under SRA049668. Approximately 90% of the reads obtained were between 200 and 600 bp from both transcriptomes. De novo assembly of reads led to the construction of 26,657 and 32,313 contigs with average lengths of 579 and 557 bases from pap1 and BR086 cultivar, respectively ( Table 1). The size distribution of the 454-reads and assembled contigs is shown in Figure 2 (a and b). Among all the contigs, 46% pap1 and 44% BR086 contigs with lengths longer than 500 bp were considered large contigs. Apart from assembly as contigs, 100,685 and 73,815 reads with average length of more than 274 bp remained as singletons. GC content of unigenes from pap1 and BR086 was approximately 40% (Table 1).   Functional annotation and classification of transcriptomes The assembled sequences, contigs and singletons, from pap1 and BR086 line were compared with the sequences in three major public protein databases (NR, TAIR and CDD) using the Basic Local Alignment Search Tool X (BLASTX) algorithm with an E-value cutoff of 10 25 . A total of 127,342 and 106,128 unigenes (contigs and singletons; Table 1) from pap1 and BR086 respectively were used for annotation using NR, TAIR and CDD database (Tables S2-S7). Collectively, a total of 55,019 (43.2%) and 39,570 (37.28%) unigenes from pap1 and BR086 transcriptome data respectively were annotated using these databases (Table S8). Out of all, 25,343 and 19,225 unigenes from pap1 and BR086, respectively, had common BLAST hits to annotated proteins in NR, TAIR and CDD (Figure 2c and 2d). More than 38.47 and 30.95 percent of singletons from pap1 and BR086 also got annotated using these protein databases. The large number of singleton sequences with BLAST matches indicates that unassembled sequences still provide an abundance of valuable sequence information and improve overall transcriptome coverage.
Gene Ontology (GO) assignments describe gene products in terms of their associated molecular functions, biological processes and cellular components. To assign putative functional roles to the unigenes from pap1 and BR086, BLASTX in Arabidopsis (TAIR) proteome was carried out to provide the GO annotation. Of the assigned GO terms, 17,140 (pap1) and 16,615 (BR086) unigenes were to biological processes, 17,084 (pap1) and 16,600 (BR086) unigenes to molecular function, as well as 15,294 (pap1) and 14,752 (BR086) unigenes to cellular components, indicating a large functional diversity of genes in the transcriptomic data ( Figure S1). Among these, 7,654 and 7,428 unigenes in pap1 and BR086, respectively, had an assignment in all three categories. The remaining unigenes failed to obtain a GO term, largely due to their uninformative (e.g. 'unknown', 'putative', or 'hypothetical' protein) description. The even distribution of assignments of proteins to more specialized GO terms further indicates that the P.somnifera 454 sequences obtained from pap1 and BR086 represent proteins from a diverse range of functional classes. These three main categories of the GO classification have been further divided in 45 functional groups (Figure 3). In each of the three main categories of the GO classification, 'other metabolic process', 'other cellular processes', and 'unknown biological process ' terms are dominant. In the biological process category, a large number of genes were annotated to 'other cellular and metabolic processes', 'response to abiotic or biotic stimulus', and response to stress. Hence, established transcriptome from pap1 and BR086 should substantially aid the discovery of novel genes involved in the biosynthesis of secondary natural products including BIAs. No major difference in number of unigenes in different GO terms was observed in pap1 and BR086 cultivar.

Features of pap1 and BR086 combined assembly
High-quality (HQ) expressed sequence tags 770,126 and 763,579 for pap1 and BR086 line, respectively, were tagged, pooled and assembled together to get more insight about complete poppy transcriptome. This was carried out with the aim to get larger contigs, to increase depth and to analyze differential gene expression (DGE) for specific-unigenes. HQ reads from both transcriptomes assembled into 533,705 unigenes consisting of 46,846 contigs with average length 567.5 bp and 118,841 singletons with average length of 266.2 bp. Length frequency distribution of contigs clustered between 100-4379 bp and most of the contigs ranged in length from 400 to 1200 bp (Figure 4a). The length of contigs and the number of reads assembled into that contig in combined assembly are more in comparison to independent assemblies of the pap1 and BR086 (Table 1; Fig. 2b).
Unigenes from combined assembly of Pap1 and BR086 were annotated with different databases (TAIR, NR, CDD) using the Basic Local Alignment Search Tool X (BLASTX) algorithm with an E-value cutoff of 10 25 . A total 57,493 unigenes (22,213 contigs and 35,280 singletons) were annotated with TAIR database. In NR and CDD database, 59,438 and 33,781 unigenes were annotated, respectively. Common annotation for 33,664 unigenes was provided by all three databases (Figure 4b). The best hit for each unigene queried against the TAIR database from combined assembly was utilized to assign functional GO annotation in terms of biological process (17,140), molecular function (17,271) and cellular component (15,421) groups. These three main categories of the GO classification have been divided in several functional groups and subgroups ( Figure 5). Three main categories of the GO classification includes molecular function (Figure 5a), cellular component ( Figure 5b) and biological process (Figure 5c). In the biological process category, a large number of genes were annotated to 'other cellular and metabolic processes', 'transferase activity' and 'enzyme activity'. Several set of unigenes were associated with unknown metabolic processes and unknown cellular component. These were further categorized in several subgroups (Table S9). In these subgroups, vast majority of unigenes were found to be involved in sub cellular control of metabolic processes. A total of 7,738 unigenes were common in all three GO categories ( Figure S2).
Functional classification and pathway assignment was performed by the Kyoto Encyclopedia of Genes and Genomes (KEGG). To identify the biological pathways that are functional in poppy, 57,493 annotated sequences from combined assembly were mapped to the reference canonical pathways in KEGG. In total, all contigs and singleton sequences of poppy were assigned to 124 KEGG pathways. Interestingly, 3,665 unigenes were found to be involved in biosynthesis of various secondary metabolites ( Table 2). Among all the pathways, the cluster for 'Limonene and pinene degradation [PATH: ko00903] represents the largest group followed by Phenylpropanoid biosynthesis [PATH: ko00940] and Isoquinoline Alkaloid Biosynthesis [PATH: ko00950] respectively. A broad survey of cellular metabolism involved in conversion of sucrose to BIAs resulted in identification of transcripts corresponding to substantial number of metabolic pathway enzymes. Analysis suggested that all the known genes involved in the biosynthesis of BIAs were present in the combined transcriptome data generated in our study ( Figure S3). Number of reads which assembled to specific BIA biosynthesis related genes is provided in Figure S4. Out of all the genes, berberine bridge enzyme (BBE) had lowest number of reads (583) whereas (S)-norcoclaurine-6-O-methyltransferase (6OMT) was formed with highest number of reads (13,442). The annotated contigs homologous to genes of the primary metabolic pathways displayed high homology to Arabidopsis or other dicot genes, with most of the genes having more than 80% similarity at the protein levels, suggesting that these genes were highly conserved during the evolution.

Identification of differentially expressing transcript involved in papaverine biosynthesis
Before our study, there have been two studies on poppy transcriptome which generated information of unigenes from poppy cell cultures (93,723 unigenes) for sanguinarine biosynthesis [27] and comparative analysis of eight different cultivars (58,140 to 111,841 unigenes) [39]. These studies were confined to identify genes involved in the pathways and no comparative analysis was carried out for elucidation of a single biochemical route of specific alkaloid. However, our comparative analysis using pap1 and BR086 transcriptomes was aimed to identify putative genes which might be involved in papaverine biosynthesis. To identify differentially expressing genes, the R statistical test [32] was applied and only R value $8 was filtered that gave a believability of .99%. In this test, the singletons were statistically insignificant and hence discarded. Out of 46,920 contigs from combined assembly of pap1 and BR086 only 3,336 were significantly (R value  (Table S10) with more than 2-fold change in expression in pap1 in comparison to BR086. Likewise 1943 genes were down-regulated (Table S11) with more than 2-fold change in expression in pap1 in comparison to BR086. Of these, 76 upregulated and 272 down-regulated genes did not give hits in any of the databases analyzed and were novel genes and may be involved in different pathways or molecular networks involved in primary and secondary metabolic pathways. Our analysis suggested that out of all known genes involved in BIA biosynthesis, TYDC, NCS and CYP80B3 does not express differentially between pap1 and BR086. This suggests that substrate flux for the biosynthesis of (S)norcoclaurine which in part of both NH-and NCH 3 -pathways ( Figure 6) may not be deciding factor for the papaverine biosynthesis.

Putative genes involved in uncharacterized steps of papaverine biosynthesis
Using different set of experiments, studies suggest two putative and controversial routes of papaverine biosynthesis ( Figure 6). Both the pathway utilize (S)-norcoclaurine synthesized from precursor L-tyrosine. Out of two pathways, the NH-pathway which utilizes N-desmethylated intermediates was proposed on the basis of feeding experiments with labeled precursors combined with controlled degradation of labeled papaverine [40] and in vitro functional characterization of norreticuline-7-Omethyltransferase (N7OMT) [21]. The second proposed pathway (NCH 3 pathway) proceeds through several N-methylated intermediates and involves (S)-reticuline [23]. However acceptance of both (S)-reticuline and (S)-norreticuline as substrate does not discount either the N-desmethyl or N-methyl benzylisoquinoline routes [24]. The controversy also exists due to absence of information related to genes/enzymes involved in uncharacter-ized steps like 39-O-methylation and an unknown dehydrogenation required for papaverine biosynthesis. These three steps are bottlenecks in the proposed pathway and identification of genes involved in these steps will lead to elucidate exact pathway. It has been suggested that to solve this controversy, there is need to use systems biology approach utilizing mutants/contrasting chemotypes of papaverine biosynthesis, NGS technology and functional genomics. Recently, such analysis have been carried out using top1 (thebaine oripavine poppy 1) mutant which has helped in identifying genes involved in thebaine biosynthesis [17]. In case of papaverine biosynthesis pathway several missing links need to be elucidated, these uncharacterized step involves members of methyltransferase, cytochrome P450, and dehydrogenase gene families. In our transcriptome data, we analyzed for transcripts encoding members of these gene families and studied their differential expression.
After pooling and stringent filtering of combined assembled transcriptome data, a total of 363 contigs annotated as methyltransferases were identified (Table S12). Out of these, more than 100 contigs are already reported in BIA pathways from different plants. A total of 36 contigs showed annotation with Papaver somniferum EST database. Several contigs were annotated as 49OMT, 69OMT, CNMT, 7OMT and N7OMT as well as SAM dependent methyl transferase already known in P.somniferum. Hierarchy clustering analysis suggest that number of MTs are differentially expressed between pap1 and BR086 (Figure 7a). Of the total MTs, 22 and 37 are at least 2-fold up-or down-regulated in pap1 in comparison to BR086. The up-regulated contigs also include contigs showing homology to 49OMT (contig30047; 3.4fold), 69OMT (contig25475; 2.2-fold), CNMT (contig 27244; 2.3fold) and N7OMT (contig52316; 2.1 fold) (Table S12) which are categorized in N-desmethyl group. Out of total down-regulated MTs, (contigs Nos. 26349, 42117 and 9735) were down-regulated most. In these, contig26349 encoding (R,S)-reticuline 7-Omethyltransferase (7OMT) was down-regulated (4.65-fold) which is proposed to be involved in N-CH3 pathway and utilizes (S)reticuline ( Figure 6) to synthesize papaverine [23]. As per proposed pathways, if NCH 3 -pathway is involved in enhanced biosynthesis of papaverine in pap1, it will require up-regulation of 7OMT to divert flux towards papaverine biosynthesis instead of BIAs. Interestingly, our analysis revealed more than 4-fold downregulation of 7OMT in pap1. Collectively, up-regulation of known 6OMT, 4OMT and N7OMT as well as down-regulation of 7OMT suggests that in pap1 mutant, (S)-reticuline may not be preferred substrate for papaverine biosynthesis. However, to utilize (S)-coclaurine via NCH 3 pathway for papaverine biosynthesis, additional 39OMT is required.This route of pathway needs methylation at three steps which can be catalysed by 49OMT, N7OMT and 39OMT. In our transcriptome data, several unigenes homologus to N7OMT showed 1.7-to 2.1-fold more expression (Table S12) in pap1 in comparison to BR086 suggesting involvement of N7OMT in papaverine biosynthesis. Earlier, 49OMT has been shown to be promiscuous and involvement of same enzyme was shown in NH-and NCH 3 -pathway [25]. 49OMT with established function in biosynthesis of BIAs is upregulated more than 3-fold in pap1 in comparison to BR086. Together, up-regulation of N7OMT and 49OMT suggests that that NCH 3 -pathway plays important role in papaverine biosynthesis.
We also looked for putative candidates genes encoding 39OMT which can be part of uncharacterized step in NCH 3 -pathway. Out of 22 total up-regulated (more than 2-fold) MTs, interestingly, one unigene (contig23231) (Table S12) showed maximum homology to 9-O-methyltransferase (9OMT) from Thalictrum tuberosum is up- regulated more than 5-fold (Table S12; Figure 7a). Recently, SOMT1 annotated as 9-O-methyltransferase accept both (S)norreticuline or (S)-reticuline as substrate. This SOMT1 is the only O-methyltransferase which functions efficiently on both Nmethylated and N-desmethylated benzyl isoquinoline. SOMT1 gene shows 39-O-methyltransferase activity and silencing of SOMT1 reduce papaverine level [24]. As contig23231 shows homology to SOMT1, we conclude that transcript related to this contig might be encoding for 39OMT and participate in papaverine biosynthesis via NCH 3 route. In our transcriptome data, a total of 584 dehydrogenases were present (Table S13) which falls into short chain Dehydrogenase/ reductase (SDR) families including tropionine reductase, alcohol dehydrogenase and malate dehydrogenase. Hierarchy clustering analysis suggest that number of dehydrogenases are differentially expressed between pap1 and BR086 (Figure 7b). N-desmethyl route of papaverine biosynthesis involved uncharacterized dehydrogenase which converts (S)-tertahydropapaverine to papaverine ( Figure 6). Of the total dehydrogenases, 34 and 33 are at least 2fold up-or down-regulated in pap1 in comparison to BR086. The up-regulated dehydrogenases identified in this study, might be involved in the uncharacterized step leading to papaverine biosynthesis from (S)-tetrahydropapaverine ( Figure 6). Validation of expression of differentially expressed genes Analysis of transcriptome data from pap1 and BR086, identified genes encoding members of methyl transferase and dehydrogenase gene families which might be involved in papaverine biosynthesis. To validate differential expression of these identified genes, we selected 10 methyl transferases (5 up-and 5-down-regulated) and 10 dehydrogenases family (5 up-and 5-down-regulated) members for validation of expression. All these genes showed differential expression (up and down-regulation) in pap1 in comparison to BR086 (Figure 8a and 8b; upper panel). The differential expression observed was similar to digital expression results (Figure 8a and 8b; lower panel) suggesting that identified genes, apart from those taken for validation through real-time PCR, are differentially expressed between pap1 and BR086. Similar analysis through analysis of transcriptomes from different tissues for differentially expressed genes has led to the identification of genes involved in secondary plant product biosynthesis [41]. The validation of expression of selected genes through real-time PCR suggests that other differentially expressed genes identified, in this study, through digital expression analysis may be involved in various uncharacterized steps in papaverine biosynthesis.

Conclusion
Biosynthetic pathways for paparverine production in Papaver somnifera have only been putative and controversial; our study attempted to rectify this discrepancy through transcriptional profiling in a high-papaverine producing mutant, pap1, compared to wild-type, BR086. Our comparative transcriptome analysis clearly suggests that papaverine biosynthesis does not primarily proceed through the N-demethylated branch-point intermediate (S)-reticuline as proposed one of the pathways. Instead, biosynthesis of papaverine appears to utilize (S)-coclaurine as a key branch-point intermediate. Our analysis also identifies a set of methyl transferases and dehydrogenases which might be playing role in uncharacterized steps in papaverine biosynthesis through Figure 7. Differential expression of methyltransferases (a) and dehydrogenases (b) with their differential expression in pap1 and BR086. Two columns in each represent pap1 and BR086, while each row represents contigs encoding members of these families (Table S10 and S11). Clustering was carried out with log tpm value of each contig in pap1 and BR086 transcriptome to visualize differential expression. doi:10.1371/journal.pone.0065622.g007 NCH 3 -pathway. Functional characterization of identified genes in this study will lead to establish papaverine biosynthetic pathway which has been controversial for a long time.        Figure 8. Validation of differentially expressed transcripts in pap1 and BR086. Real time quantitative PCR of selected differentially regulated methyltransferase (a) and dehydrogenase gene family (b) was carried out using total RNA isolated from peduncle tissue of prehook stage of pap1 mutant and BR086. Digital gene expression of genes used for validation through quantitative Real time PCR is provided in lower panel of (a) and (b). Information about selected up-and down-regulated transcripts is provided in Table S1. doi:10.1371/journal.pone.0065622.g008