Genomic Characterization of Phenylalanine Ammonia Lyase Gene in Buckwheat

Phenylalanine Ammonia Lyase (PAL) gene which plays a key role in bio-synthesis of medicinally important compounds, Rutin/quercetin was sequence characterized for its efficient genomics application. These compounds possessing anti-diabetic and anti-cancer properties and are predominantly produced by Fagopyrum spp. In the present study, PAL gene was sequenced from three Fagopyrum spp. (F. tataricum, F. esculentum and F. dibotrys) and showed the presence of three SNPs and four insertion/deletions at intra and inter specific level. Among them, the potential SNP (position 949th bp G>C) with Parsimony Informative Site was selected and successfully utilised to individuate the zygosity/allelic variation of 16 F. tataricum varieties. Insertion mutations were identified in coding region, which resulted the change of a stretch of 39 amino acids on the putative protein. Our Study revealed that autogamous species (F. tataricum) has lower frequency of observed SNPs as compared to allogamous species (F. dibotrys and F. esculentum). The identified SNPs in F. tataricum didn’t result to amino acid change, while in other two species it caused both conservative and non-conservative variations. Consistent pattern of SNPs across the species revealed their phylogenetic importance. We found two groups of F. tataricum and one of them was closely related with F. dibotrys. Sequence characterization information of PAL gene reported in present investigation can be utilized in genetic improvement of buckwheat in reference to its medicinal value.


Introduction
Rutin and Quercetin are plant metabolites having antioxidant property and play a significant role in combating diabetes [1]. Diabetes is a chronic metabolic disorder resulted in mortality of over one million people globally [2]. Besides diabetes, rutin helps in reducing severity of colon carcinogenesis [3] and hypertension [4]. Rutin is neither present in cereals nor in China. The latter areas are recognised as the natural habitat of the Fagopyrum genus including its wild relatives [16]. PAL gene has not been studied in most wild relatives of Fagopyrum genus. However, since past few decades research efforts have been given on F. dibotrys, led to utilize this species extensively for the characterization of this gene [11]. Rutin is a phenolic compound present in high concentrations in 'tartary buckwheat' and to limited extent in 'common buckwheat' [17]. Very little information is available concerning the genetic analysis of different species of the genus Fagopyrum, which led this genus remain underutilized. F. esculentum and F. dibotrys are allogamous, whereas F. tartaricum is an autogamous species. Morphological similarities suggested a greater closeness between F. dibotrys and F. esculentum [18,19]. Conversely, recent RFLP-cpDNA molecular analysis revealed that F. dibotrys is more closely related to F. tataricum compared to F. esculentum [20]. An in-depth characterization of different Fagopyrum species with important genes (such as PAL gene) will lead to an increased taxonomic understanding and ultimately helps in their genetic enhancement as a crop of economic value.

Results
Allele mining of PAL gene in F. tataricum and related species The molecular profiling of PAL gene from different accessions of F. tataricum, F. esculentum and F. dibotrys, led to decipher the species specific allelic sequence variations in the form of SNPs and/or Indels (Fig 2 and S1 Fig).
In addition to the species specific sequence signatures intra-specific variations were also found. SNPs alleles were designated with letters 'A' and 'B' (Fig 3). Furthermore, two accessions exhibited the presence of three insertions at exon2 (Fig 3 and S2 Fig) causing the variation of a stretch of amino acids with respect to its putative protein and designated as allele 'A1' ( Table 1).

Effect of SNPs/Indel on putative protein
Annotation of Allele 'A1' from F. tataricum sequences revealed three insertion mutations, found to cause frame shift of 39 amino acids in ORF of PAL gene. This frame shift resulted in altered amino acids stretch on putative protein corresponding to exon2 in two F. tataricum accessions (Figs 3 and 4). Amplified sequences of the other two Fagopyrum species were  submitted to Genbank and allele designated with the SNPs causing the change of amino acid as shown in Table 1.

SNP analysis in natural populations of F. tataricum
A SNP at 949 th base pair position in PAL gene (Fig 5) was found to be present in homozygous and heterozygous conditions in 10 and 6 accessions of F. tataricum respectively. Subsequent intra-varietal analysis revealed that among 77 genotypes, 50 and 27 samples showed homozygosity and heterozygosity for this SNP respectively (Fig 6). Further, analysis

Interspecies sequence analysis
Interspecies polymorphic site analysis revealed the presence of more polymorphic sites in F. dibotrys followed by F. esculentum and F. tataricum. Phylogenetic analysis indicated towards taxonomic closeness of F. tataricum and F. esculentum, which was further proved with the presence of relevant SNPs and indel mutations (S3 Fig). However, parsimony informative sites (PIS) with Linkage Disequilibrium (LD) were not collinear to their exact nucleotide positions among these three species, although some PIS were sharing identity with other species (Table 2).
Gene flow and genetic differentiation resulted into three haplotypes among the accessions of F. tataricum. Observed homozygous and heterozygous individuals from Tetra primer ARMS PCR were subjected to Hardy Weinberg Equilibrium analysis, which revealed 60 and 33 genotypes with alleles GG and GC with allele frequencies of 62.5% and 37.5% respectively (Fig 5). Further screening of this SNP in intra-varietal genotypes revealed homozygous alleles (64.93%) compared to heterozygous (35.06%) as shown in Fig 6. Phylogenetic study indicated the presence of two F. tataricum groups sharing each SNP sites (949 and 1346 th ) in LD and PIS separately with F. dibotrys and F. esculentum (Fig 7). Divergence time tree clearly explained the relative and early divergence of the ancestor species belonging to the clade of Fagopyrum spp. and Medicago trancatula than the ancestor species of rest of the dicots clade (Fig 8).
Through sequencing, the heterozygosity at 949 th bp position was not identified. However, using Tetra primer ARMS PCR, the presence of heterozygous genotypes (with GC allele) from the natural populations was observed. Interestingly, one of the homozygote allele CC was not  found through this method. These results are in agreement with Hardy Weinberg Equilibrium.
Practically it was not possible to assess the homozygous individuals with CC allele. This allele was predicted to be in frequency of 0.03 (q 2 = 0.03) through Hardy Weinberg Equilibrium. The χ 2 value was 4.33 with significant P-value of 0.0374 (P<0.05). Using the allele frequency of p allele and q allele (0.82 and 0.18), the genotype frequency was calculated according to the Hardy Weinberg Equilibrium (p 2 +2pq+q 2 ). Thus, p 2 = 0.6732, 2pq = 0.1476 and q 2 = 0.03.

Discussion
Sequence characterization of the PAL gene was carried out from Fagopyrum spp. in this study, which plays an important role in rutin and quercetin bio-synthesis pathway. Species specific sequence signatures were observed showing evolutionary significance of Fagopyrum genus as well as putative protein structure. Three insertion mutations and three SNPs were identified in F. tataricum. Among three SNPs, one was singleton variant and other two are PIS, one at 949 th and other at 1346 th bp positions. SNPs at 949 th and 1346 th bp position were in intron1 and exon 2 respectively in the PAL gene. The three insertion mutations in PAL gene caused a variation of stretch of 39 amino acids in exon2 of ORF in comparison with reference PAL protein, ACT68010 (Figs 3 and 4). These insertion mutations caused frame shift of 39 amino acids resulted into different protein isoform as implicated in this present study in accordance with the previous reports [21,22]. Altered protein due to change of 39 amino acids likely resulting for the evolution of adaptive proteins [23] and may cause structural and functional changes. Theoretical predictions of physicochemical properties revealed that the protein of allele 'A1' (altered protein of 39 amino acids) possessed 57 positively charged residues (Arginine + Lysine) with 6.19 theoretical isoelectric focusing point (pI), while the reference protein possessed 53 positively charged residues with 5.81 pI. The instability index of the variant region with 39 amino acids of allele 'A1' alone considered as unstable one, as instability index (II) calculated was 76.08, which exceeded the instability index limit of 40 [24]. The transition but synonymous mutation observed at 1346 th position did not change the amino acid 'serine'. Further, comparison of the putative protein of PAL allele A1 (AHC29062) in reference to PAL putative protein (Protein ID: ACT68010) and  indicated for no change in active site (GTITASGDLVPLSYIAG). However, protein modelling suggested a significant alteration in the protein structure and thereby the possible alteration of physico-chemical properties.
The amino acid change in exon2 of F. esculentum is shown in Fig 9. There were five amino acids change and two of them were conservatively altered (Glutamine to Glutamic acid, Valine to Isoleucine) and changes in other three amino acids were non conservative (Proline to Asparagine, Histidine to Arginine, Cysteine to Arginine). Similarly, five amino acid changes were observed in F. dibotrys (Fig 10). In F. dibotrys exon2, the SNPs caused two conservative changes in amino acids (Glutamine to Glutamic acid, Glutamic acid to Aspartic acid), whereas other SNPs caused non conservative alteration (Cysteine to Arginine, Valine to Lysine, Methionine to Lysine). Although, the positions of amino acid change were not collinear between F. esculentum and F. dibotrys. In F. tataricum, no amino acid change was detected from the observed synonymous mutation/SNPs variation, while three insertion mutations caused the change of long stretch amino acids. Apart from these non-silent mutations, there were more than 30 SNPs silent mutations observed in both F. esculentum and F. dibotrys, while in F. tataricum only one silent mutation was observed. Overall, we found more SNP mutations in allogamous species F. esculentum and F. dibotrys, than autogamous F. tataricum. Conversely, indel mutations were observed only in F. tataricum (not in F. dibotrys and F.esculentum) which caused a major change in putative protein (Fig 3). The SNP and indel mutations observed in different F. tataricum, F. dibotrys and F. esculentum indicate towards the evolutionary role of PAL gene in Fagopyrum spp.
The sequences of F. tataricum were represented as two sub-groups (group 1 and 2) according to the 949 th and 1346 th bp SNPs (Table 3). Genetic diversity within and between the two groups revealed that the group one is more diverged as compared to sub-group two. Genetic differentiation of both assigned sub-groups was statistically significant with pairwise comparison. Haplotype based statistics for the genetic differentiation of these two groups was significant with PM test (Table 3). This finding was further supported by Fst estimate and effective migrants (Nm) indicated towards an absolute migration with low gene flow (Table 3). Similar trend of haplotype diversity was also previously reported [25]. These results clearly indicated the phylogenetic importance of two tightly linked PIS at 949 th and 1346 th bp SNP positions.
Putative PAL gene protein of Fagopyrum spp. (generated in our study) was aligned with the protein of PAL gene from other dicot spp. PAL protein, which aligned from Fagopyrum spp. and other dicots led to identify the conserved signature motif 'GTITASGDLVPLSYIAG'. Further, we calculated relative divergence time (0.8), which revealed an early divergence of the ancestor species of the clade of Fagopyrum spp. and Medicago trancatula from the ancestor species of the clade other dicot spp. (0.7) subjected to analysis. Besides, within a clade, the divergence time revealed an early divergence of Fagopyrum spp. (0.1) than Medicago trancatula (0.0). It is noteworthy fact that both M. trancatula and Fagopyrum spp. are well known for rutin production [26,6], whereas in most other dicots, it has been predominantly associated with lignin and anthocyanin production [27,28,29]. In particular, 8 amino acids were identical between these two species corresponding to F. tataricum PAL protein 642 th to 652 th amino acid positions: 'ARTLYNNGASG' rather than other species. Therefore, protein sequence alignment clearly revealed the close proximity of amino acids of Fagopyrum spp. with Medicago trancatula, which is highly likely associated with rutin bio-synthesis pathway.
There were two SNPs in F. tataricum (SNP at 949 th and 1346 th bp position) showing LD and PIS and one of them (SNP at 949 th bp position) showed association with agronomically important traits. SNPs at 949 th and 1346 th bp were located in intron (only intron of this gene) and exon2 respectively. The SNP at 949 th position was always found to be in LD with 1346 th , a mutation in the first site is always paired with the presence of SNP in the second site (i.e. 1346 th bp). Interestingly, heterozygosity at these sites (SNP at 949 th and 1346 th bp position) showed correlation with increased seed number, reduced plant height and 100-kernel weight ( Table 4). It is a well-established fact that SNPs at splicing sites or branch points of intron may affect the splicing of intron and exon. As the result mRNA transcript may be abnormal, because of these kinds of mutational consequences of important sites at intron. But in this study, we found a mutation apart from these splicing sites or branch points, so functionally it has no direct role, while the SNP/mutation in intron (949 th position) always paired with 1346 th exon2 SNP due to LD. If there is alteration in SNP at 949 th bp (intron) then there will be alteration in exon at 1346 th bp due to LD. Based on these facts we hypothesize that that SNP mutation in exon have 'functional agronomic' role. However a definitive test would further confirm this.
Numerous studies have been focused on SNP analysis of PAL gene in different plant species to improve the yield with reference to rutin, anthocyanin, lignin or relevant metabolites  [30,31,32,33]. Among Fagopyrum spp. total flavonoid content is commonly higher in F. tataricum than F. esculentum. Among released F. tataricum varieties, 'Donan' is very popular and known for high thousand kernel weight as revealed in our study (data not presented). This variety can be utilized as a potential germplasm source for medicinal application.

Polymorphic sites in Fagopyrum spp. at inter and intra species level
Through interspecies sequence analysis of the three Fagopyrum species, we identified PIS and other useful sites ( Table 2). Disparity index revealed the existence of homogenous substitution pattern between F. tataricum and F. dibotrys with significant heterogeneity between F. esculentum and F. dibotrys (S1 Table). Distance matrix index values also revealed that the distance between F. tataricum and F. esculentum is more than F. dibotrys. The distance index between F. dibotrys and F. tataricum was between 4-5%, while the distance index with between F. dibotrys and F. esculentum was 15-16% (S2 Table, Fig 7). Similar results were presented in previous reports [34]. F. esculentum and F. tataricum had two PIS, while in F. dibotrys four PIS were observed. Intra-specific SNPs were maximum in F. dibotrys (18) followed by F. esculentum (11) and lest in F. tataricum (3). SVs were also least in F. tataricum than other two allogamous species. In these three species balancing selection maintained the monomorphic sites at 553 positions and thus the variations of only 116 positions allowed to discriminate these species. In contrast, the adaptive mutation reduced the variations of these 553 positions, which are remaining unchanged during evolution (Table 2). Besides, the gene exhibited significant variation with 42 bp deletion in F. tataricum and F. dibotrys as shown in S3 Fig (corresponding to the insertion  in F. esculentum).
In F. tataricum three pair of sites with LD was observed and among them, the one between 949 th and 1346 th bp was statistically significant (S3 Table). Allelic pattern at this LD site in PAL gene have been depicted in Fig 7. LD event in F. tataricum classified this species in two different groups (groups I and II). The SNP allele of F. tataricum group II at 949 th bp (Cross species comparison site 952) showed identity in F. esculentum and F. dibotrys at this locus, indicating that this allele was contributed to F. tataricum by F. dibotryis/F. esculentum, while group I allele from some other progenitor. Similar observation for F. tataricum group I allele at 1346 th bp (cross species comparison site 1395) supported to above mentioned conclusion. Noticeably, these PIS and/or recombinations were found within 400 bp region of PAL gene. There were other LD events present in this gene among different Fagopyrum spp. as indicated in the S4 Fig. LD sites were more in allogamous species (F. dibotrys and F. esculentum) than F. tataricum. F. tataricum group II was closer to F. dibotrys than group I as shown in Fig 7. It clearly revealed the importance of SNPs with LD and PIS of PAL gene in evolution. These SNP and indel variations clearly indicated that F. tataricum is more closely related to F. dibotrys than F. esculentum (Fig 7 and S3 Fig). The species specific sequence signature in PAL gene of three Fagopyrum spp. has emphasized the phylogenetic importance of this gene.
There were three types of inter-specific SNPs: (i) which represented LD and PIS (ii) other which showed only LD and not PIS and (iii) those which only represented PIS. With reference to the SNPs, which represented both LD and PIS in F. tataricum were species specific i.e. across the species they were not comparable ( Table 2). SNPs in two positions, which showed LD of F. tataricum, are not sharing identity in other species, while one SNP among these two were sharing identity in either species. SNPs with PIS alone shared more identity between F. tataricum and F. dibotrys than F. esculentum. These results indicate that species specific SNPs are under selection pressure, when they are in LD. The breakage of LD due to mutation, genetic drift and absence of selection pressure might disturb these SNPs. SNP at the 949 th bp position had two alleles 'GG' and 'CC'. Interestingly, in the natural population of F. tataricum we could detect only one homozygote 'GG' and heterozygote 'GC'. The 'CC' homozygote was neither identified through sequencing the gene nor through following Tetra primer ARMS PCR strategy. Following the Hardy Weinberg Equilibrium, we predicted the frequency of 'CC' homozygote (0.03%) to be rare. This was the most probable reason for not identifying the rare allele 'CC' in present study.
Present study provides an in-depth sequence characterization of PAL gene in Fagopyrum spp. which is known for its medicinal value. The sequence information concerning the SNPs/ alleles can be used for the identification of elite cultivars from germplasm collections of F. tataricum and related species within the genus Fagopyrum as well as the species from other genus of plant kingdom. Certain insertion/deletions caused major variations of amino acids in F. tataricum possibly due to genomic plasticity events in this species, which harbored beyond normal mutations and thus caused enormous variations. Comparative genomics of these kinds of alleles with other species will excavate the rare mutations in other species. Overall analysis clearly suggested towards an evolutionary significance of PAL gene in the genus Fagopyrum. Informations presented in this report can be efficiently utilized in genetic improvement of Fagopyrum spp. with respect to its medicinal relevance.

Materials and Method
Genotypes and DNA extraction Sixteen accessions of F. tataricum were utilized for the screening of inter and intra-specific diversity. To facilitate the understanding of the evolutionary relationship, five accessions of F. esculentum and five of F. dibotrys were also included. The genetic material was either obtained from different sources as shown in Table 5.
In order to analyse the intra-varietal zygosity, about four genotypes of each F. tataricum accession were germinated in petri plates, transferred to pots and grown in a greenhouse.
For each genotype, approximately 100 mg of fresh leaves were collected from 4 weeks old plantlets and ground with liquid nitrogen. Total DNA was extracted by CTAB method [35], quantified using MaestroNano Micro-Volume Spectrophotometer (Cat. No. MN-913, Maestrogen) and further diluted with sterile distilled water to obtain a DNA template with a concentration of 50 ng/μl. Similar methodology was followed for the extraction of DNA from individual genotypes of F. esculentum and F. dibotrys accessions as mentioned in the Table 5.

Polymerase Chain Reaction and Sequencing
Specific forward and reverse primers for F. tataricum PAL gene were designed (S4 Table) using reference sequence available at GenBank [13]. Primers, synthesised by Sigma Aldrich S.r.l. (Milano, Italy), allowed amplifying the whole gene, from start to stop codon, within a single Polymerase Chain Reaction (PCR). Alternatively, additional couples of primers were also designed to anneal with different regions, so that the fragments obtained, when overlapped, would cover the whole length of the gene. The PCR reaction volume was fixed at 25 μl and included the following reagents: 2 μl of dNTP 200 uM, 1.5 μl of 3 mM MgCl 2 , 2.5 μl of 1X Reaction buffer, 0.2 μl of 1 Unit Bioline Taq, 1 μl of 1 pM Forward primer, 1 μl of 1 pM Reverse primer, 15.8 μl of sterile distilled water and 1 μl of DNA template.
The PCR amplification was performed on a Mastercycler 1 pro (Eppendorf) thermocycler using the following cycling program Initial denaturation at 94°C for 5 minutes, 35 cycles consisting of 1 minute denaturation at 94°C, 1 minute annealing at 57°C and, 1.5 minutes extension at 72°C, and final extension at 72°C for 10 minutes. Samples were stored at 4°C overnight and subsequently added with 2 μl of MaestroSafe Nucleic Acid loading dye (Cat. No. MR-031201, Maestrogen). Amplified fragments were resolved using 2% agarose gel electrophoresis at 90 V for 90 minutes. Each time the expected size band was visualized through an UltraSlim LED Illuminator (Cat. No. SLB-01W, Maestrogen) identified thanks to the comparison with a 1 kb molecular-weight size marker (DNA ladder) (AccuRuler) and excised from the gel with the aid of a clean scalpel. Excised fragments were purified using a Sigma Aldrich GenElute agarose gel purification kit following the manufacturer's directions.
The concentration of purified fragments was measured with a MaestroNano Micro-Volume Spectrophotometer (Cat. No. MN-913, Maestrogen) and diluted to 56 ng/ μl. 1 μl of the solution was added with 13 μl of sterile distilled water and 1 μl of 10 μM appropriate primer. The reaction mixture obtained was sent for sequencing with AB1 sequencer by Ylichron/Genechron, Rome. Previously synthesised internal primers were used for sequencing (S4 Table).

Utilization of the sequences for SNPs identification and phylogenetic analysis
Chromatograms were screened using Finch TV (Geospiza Inc., USA) chromatogram viewer software. Sequences of the expected fragment were aligned using Clustal W [36] and the presence of SNPs and insertion deletion mutations was manually detected. Among these the potential SNP (949, G>C) with Parsimony Informative Site (PIS) was selected and utilised as a basic platform for designing Tetra primer ARMS PCR. Phylogenetic analysis and Relative Divergence Time were done using MEGA (Molecular Evolutionary Genetic analysis software) [37]. Using PAL gene/alleles generated in this study and with reference gene sequences from NCBI, a Phylogenetic tree was constructed through Maximum Likelihood method with Jukes Cantor (JC) model and 1000 bootstrap resampling. Besides, F. tataricum putative PAL protein (AHC29062) was subjected to BLASTp against non-redundant (nr) protein database at NCBI and 98-100% query coverage with 85%-99% similarity range based dicot orthologous sequences were retrieved and aligned using Clustal X [36]. Subsequently excluding gaps and missing parameter, Time tree was generated through RelTime using Maximum Likelihood method with Jones-Tailor-Thorns (JTT) model and 1000 bootstrap resampling [38]. Nucleotide substitutions were assessed through disparity index [39] using Monte Carlo test with 500 Table 5. Buckwheat varieties and accessions utilized for the screening of inter-and intra-specific diversity: origin and seed source.

Species/variety
Origin Source replicates. Genetic analysis was done using a computational algorithm Gamma statistics for gene flow estimates of haplotypes [40], DeltaST [41], Nst [42], Fst [43] of sequence gene flow estimates and other analysis were done using DNAsP V5 [44]. The Hardy Weinberg Equilibrium was assessed with OEGE, Hardy-Weinberg Equilibrium calculator [45] using number of homozygous and heterozygous genotypes resulted from Tetra primer ARMS PCR. Tetra primers were designed using the tools/program available at the web server http://primer1.soton.ac. uk/primer1.html [14]. Tetra primer ARMS PCR reaction master mix and primers are shown in S5 and S6 Tables respectively. Inter and intraspecific SNPs with PIS were subjected for evolutionary analysis. Step 3: Final extension of 10 minutes at 72°C. The outer band amplicon size was size 484 bp, the G allele and C allele amplicon size was 297bp and 244 bp respectively. In order to improve the amplification the concentration of outer and inner primers were maintained at 1:2 ratio (10 μM of Outer primer and 20 μM of Inner Primer). The amplified products were resolved and visualized using 5% agarose gel. Further primers were designed and the same methodology was applied to amplify either whole PAL gene or fragments of F. esculentum and F. dibotrys and clear chromatogram derived FASTA file fragments were assembled using CAP3 [46]. Protein modelling was done using Geno3D [47] and visualized and annotated with Rasmol [48]. Active site finding was done with Scanprosite tools and the documentation of protein physico-Chemical parameters including instability index was calculated using Protparam tool at Expasy server http://web.expasy.org/tools/protparam/protparam-doc.html [24].

Phenotypic analysis study
All phenotypic and genotypic data was imported to MS-Excel and the results were compared with homozygous and heterozygous alleles for a SNP position with parsimony informative site and linkage disequilibrium. The statistical analysis of phenotypic traits with respect to zygosity was done using R program [49].

Conclusion
F. tataricum and F. esculentum are medicinally important species besides the nature of being pseudocereal food resource crops. Genetics and genomics studies are being focused widely for these two species to enhance their medicinally important flavonoid compounds rutin and quercetin. We here report that the medicinally important PAL gene has an evolutionary significance in Fagopyrum spp. Further, we also provided a detailed sequence characterization of this gene which led to identify novel SNP and indel variations. Informations generated in this report can be efficiently utilized in genetic improvement of the under-utilized domesticated Fagopyrum spp. for nutraceutical food resource.