Genome survey sequencing for the characterization of genetic background of Dracaena cambodiana and its defense response during dragon’s blood formation

Dragon’s blood collected from the genus Dracaena is used as a renowned traditional medicine in various cultures worldwide. However, the genetics of the genus Dracaena and the formation mechanism of dragon’s blood remain poorly understood. Here, we generate the first draft genome reference assembly of an elite Chinese Dracaena species, Dracaena cambodiana, from next-generation sequencing data with 89.46× coverage. The reads were assembled into 2,640,704 contigs with an N50 length of 1.87 kb, and a 1.05 Gb assembly was finally assembled with 2,379,659 scaffolds. Furthermore, 97.75% of the 267,243 simple sequence repeats identified from these scaffolds were mononucleotide, dinucleotide, and trinucleotide repeats. Among all 53,700 predicted genes, 158 genes involved in cell wall and plant hormone synthesis and reactive oxygen species scavenging showed altered regulation during the formation of dragon’s blood. This study provides a genomic characterization of D. cambodiana and improves understanding of the molecular mechanism of dragon’s blood formation. This report represents the first genome-wide characterization of a Dracaena species in the Asparagaceae.


Introduction
Asparagaceae is a new family derived from the Liliaceae by the Angiosperm Phylogeny Group (APG) in 1998 [1].In this family, Dracaena is one of the oldest genera, and Dracaena species are used as ornamental or horticultural plants worldwide [2,3].An injured trunk or branch of a Dracaena plant can exude a red resin, known as dragon's blood, which has been utilized as a traditional medicine for wounds, fractures, piles, leucorrhea, diarrhea, stomach and intestinal ulcers, and even some types of cancer in the histories of many cultures [4][5][6][7][8].Modern chemical and pharmacological studies have indicated that the flavonoids, saponins, terpenes, and steroids in dragon's blood are pharmacodynamic compounds [9][10][11].In China and Southeast Asian countries, D. cambodiana has been preferred for dragon's blood extraction and is widely cultivated [12].Due to their medicinal and economic importance, wild Dracaena plants have been exploited excessively, and many of them, including D. cambodiana, have been considered endangered [13].
The dragon's blood product is pharmaceutically valuable, and its availability is limited by the exhaustion of its plant sources and its time-consuming preparation.To date, the molecular mechanism of dragon's blood formation has remained unclear, though it is thought to be a defensive metabolite secreted from the wounded stems of Dracaena for protection against pathogens [4,14].This hypothesis suggests that the formation of dragon's blood involves a special defense response in Dracaena plants.The expression of defense-related genes and the synthesis of defensive substances are two crucial defense mechanisms protecting plants from biotic stress [15], and these strategies include the NBS-LRR genes and the endogenous plant hormone salicylic acid [16,17].However, the limited genomic and genetic resources available for Dracaena impede the mechanistic exploration of dragon's blood formation.Only the expression of genes related to flavonoid and saponin synthesis during dragon's blood formation has been described through transcriptome sequencing [18,19].
In this study, we sequenced the genome of D. cambodiana and performed a draft assembly to examine the genetic background of Dracaena and the molecular mechanism of dragon's blood formation.Understanding this special defense response of Dracaena will allow new advancements in molecular breeding for this important medicinal plant.

Plant materials and DNA extraction
Five tender branches about 10-20cm were collected from individual D.cambodiana plant on the Dazhou Island (Wanning, Hainan Province, China) after authorized by the operator on duty, and then were planted in plantation at the Institute of Tropical Bioscience and Biotechnology, Haikou, China (19˚59 0 N, 110˚19 0 E).Both original Dracaena tree on the Dazhou Island and its five branches cultured are all still being alive for now.Leaves sample collecting have been authorized by our institute within the project supported by funding of 1630052 016002.The young leaves of D. cambodiana were disinfected with 75% ethyl alcohol and then frozen and stored in liquid nitrogen for genomic DNA extraction.The total genomic DNA of D. cambodiana was extracted using a plant genomic DNA extraction kit (Tiangen Biotech, Beijing, China).Subsequently, its quantity and quality were assessed using a ScanDrop 100 spectrophotometer (Analytik Jena, Germany) and 1.5% agarose gel electrophoresis.

Genome sequencing and genomic size estimation
Paired-end (PE) libraries with insert sizes of 270 and 500 base pairs (bp) were constructed following the Illumina standard protocol [27].Sequence data were generated using the Illumina HiSeq 2500 platform.The filtered clean reads were used for estimation of genome size, percentage of repetitive sequence, and heterozygosity by using k-mer analysis [28].

Genomic sequence assembly and GC content estimation
The preprocessed PE reads were assembled using SOAPdenovo, and the optimal k-mer size was selected for the maximum N50 of contigs [29].The scaffolds were progressively constructed with PE reads of different insert sizes.Only scaffolds of more than 1000 bp in length were retained in the final assembly.The GC content was calculated using 10 kb nonoverlapping sliding windows along the assembled sequence.

Simple sequence repeats (SSRs)
To inspect the complement of SSRs and provide strategies for the genome sequencing or assembly of D. cambodiana, an appropriate repetitive sequence library was constructed for predicting repeat sequences using LTR_FINDER with the de novo data described earlier [30].Then, TRE and RepeatMasker 3.3.0were used to search for homologous tandem or interspersed repeats, respectively [31,32].SSR motifs were determined using SciRoKo [33].

Genes involved in defense response during dragon's blood formation in D. cambodiana
Defense genes encoding lipoxygenase (LOX), allene oxide synthase (AOS), allene oxide cyclase (AOC), 12-oxo-phytodienoic acid reductase (OPAR), phenylalanine ammonia lyase (PAL), isochorismate synthase 1 (ICS1), acyl-adenylate/thioester-forming enzyme (PBS3), BAHD acyltransferase (EPS1), proline dehydrogenase (PDG), trehalose phosphate synthase (TPS), ascorbate peroxidase (APX), glutathione reductase (GR), superoxide dismutase (SOD), peroxidase (POD), P450, and their related proteins that are involved in jasmonic acid, salicylic acid, proline, and trehalose synthesis or reactive oxygen species (ROS) response during dragon's blood formation in D. cambodiana were selected from the gene prediction results and then illuminated by using HemI 1.0 [44] with our published transcriptome database.The genes encoding the enzymes that catalyze the synthesis of cell wall components in D. cambodiana were also analyzed using the same methods.The RNA-seq samples were collected from 6 cm above the injection site in 3-year-old D. cambodiana stems at 3 days and 6 days after injection with a special inducer [18].Healthy stems cut from D. cambodiana trees were used to generate material for the 0-day library.

Quantitative rela-time RT-PCR
To validate the RNA-seq results, quantitative RT-PCR was conducted by the following method.The product cDNA was diluted into 50 ng�μL -1 , and 1μL was used for each Real-time quantitative PCR (RT-qPCR).The RT-qPCR reaction mixtures (20μL) also contained 0.4 μL of each gene specific primer, 10μL 2×TransStart qPCR SuperMix (Transgen, Beijing, China) and 8.2μL RNase-free water.The RT-qPCR thermal cycling included cDNA degenerated in 94˚C for 30 sec, with 40 cycles of 94˚C for 5 sec and then 60˚C for 30 sec in the Strata Gene Mx3005P Real-Time PCR System (Agilent lnc., USA) with the SYBR green method.The βactin of D. cambodiana was chosen as a housekeeping gene with internal control their relative expression were assessed with 2 -ΔΔCt method [18].All RT-qPCR experiments were performed in triplicate and the gene-specific primers used in expression analysis are listed in S1 Table.

Genome sequencing and genome size estimation
To accurately investigate the genomic background of D. cambodiana, three libraries with insert lengths of 270 and 500 bp were constructed.The Q20 or Q30 values for evaluating the base quality of the sequencing data were above 90%.On the basis of these data, a total of 100.22 Gb raw sequencing data were provided, and the sequence coverage was approximately 89.46×.(Table 1).Then, these clean data were used for k-mer analysis.For the 21-mer frequency distributions, the peak of depth distribution was approximately 73, and the estimated genome size was 1.12 Gb (Fig 1).Similarly, the ratios of repetitive sequences and heterozygosis were calculated using the k-mer distribution, with results of 53.45% and 0.38%, respectively.

Genome sequence assembly and GC content analysis
The N50 of the contigs was 1.87 kb, with a total length of 1.01 Gb, and the N50 of the scaffolds was 3.19 kb, with a total length of 1.06 Gb (Table 2).The longest contig and scaffold were 139,994 bp and 348,119 bp, respectively.The GC content of the D. cambodiana genome was 37.35%, which is considered a moderate GC content (Fig 2).Moreover, the GC depth distribution was obviously divided into two peaks.This result was partly caused by a 0.38% heterozygosity rate (Fig 2).

Protein-encoding gene prediction and annotation
We predicted 53,700 genes by using Gene ID (Table 5).The average lengths of the identified genes, exons, and introns were 2,030.67,197.91, and 636.37 bp, respectively (Table 5).44.56% of the gene predicated were interactively supported by the transcripts in public RNA-seq (S3 Table ).Of these 53,700 predicted genes in the D. cambodiana assembly, 38,162 mapped genes were known genes in the public databases, of which 36,901 genes had Nt homologs, 22,153 had TrEMBL homologs, 14,873 had Swiss-Prot homologs, 12,859 had Pfam homologs, 13,093 had KOG homologs, 12,510 had GO homologs, 9,258 had KEGG homologs and 6,433 had COG homologs (Table 6).
When blasted against the NCBI NR database, 22,312 (41.55%) of the 53,700 genes possessed significant similarity to plant nucleotide sequences in GenBank (Table 6).The species most represented in the NR homolog analysis were Elaeis guineensis and Phoenix dactylifera, which belong to the Arecaceae (Table 7).
In total, 13,093 and 6,433 genes were matched in the KOG and COG functional classifications, respectively (Fig 4 and Table 5).The largest cluster in the KOG and COG analyses was general function prediction only (S1 and S2 Figs), followed by the posttranslational modification, protein turnover, chaperone, and signal transduction mechanism categories in KOG and the transcription (844, 9.44%), translation, ribosomal structure, and biogenesis (604, 6.75%) categories in COG.
Gene family analysis revealed that 38,162 predicted family member genes in D. cambodiana were shared among five plant species.Of these, 7,582 family member genes were clustered with Arabidopsis thaliana, Asparagus officinalis, Dendrobium officinale, or Populus euphratica, whereas 1,139 predicted genes were unique to D. cambodiana (Fig 5).

Putative genes associated with defense response during dragon's blood formation in D. cambodiana
According to the genome annotation of D. cambodiana, many unique sequences were annotated as plant defense response genes.In this study, these unigene sequences were reanalyzed with the public RNA-seq data of dragon's blood formation in D. cambodiana.Of these sequences, 41, 38, and 79 sequences were annotated to be involved in the synthesis of plant cell wall components, plant defense substances, and oxidative stress response, respectively (S5 Table ).
During the formation of dragon's blood in D. cambodiana, one pectate lyase, two pectin esterases, and six chitinases were activated (Fig 6A).Genes encoding antioxidases (such as APX, GR, GST, SOD, and POD), P450 and the phytoene synthase enzyme were also upregulated (Fig 6B ).Meanwhile, genes involved in plant hormone and defense compound synthesis,  such as JA-related genes (LOX, AOS, AOC, and OPAR), SA-related genes (ICS and EPS1), proline-related genes (PDG), and trehalose-related genes (TPS), were also induced in this process (Fig 6C ), indicating a joint effect of systemic acquired resistance and induced systemic resistance during the formation of dragon's blood.Some other genes involved in cell wall formation, plant defense substance synthesis and oxidative stress response were simultaneously regulated during dragon's blood formation.However, their expression was lower than that in the earlier stage of dragon's blood formation; these genes included PAL, PBS3, and the enzymes catalyzing naringenin, permease, carotene, cellulase, or galactosidase synthesis (Fig 6).

RT-qPCR validation of differential gene expression
To investigate the transcriptional response of defense related genes during dragon's blood formation were differently expression, 15 DEGs involved in plant defense response were chosen for RT-qPCR assay.Most of the selected DEGs were differentially expressed in stem under inducer treatment, showing similar patterns as reflected by FPKM values (Fig 7).Therefore, this result provided reliable and accurate transcriptional profiling data for further studies on the cross-talk between plant defense response and mechanism of dragon's blood formation in D. cambodiana.
Medicinal or horticultural trees are generally perennial and highly heterozygous [58,59].Previous studies have suggested that heterozygosity greater than 0.5% presents difficulties for short-read-based assembly, because a random selection learning strategy cannot be applied to heterozygous loci [60].However, highly heterozygous genomes have been characterized using a cost-effective strategy with the Platanus software since 2015.These genomes have included those of the crown-of-thorns starfish, Papilio glaucus, and Ananas comosus, with genome heterozygosities of 0.92%, 1.8%, and 1.89%, respectively [61,62,63].After a comprehensive analysis of the total data from three sequencing libraries, the heterozygosity of D. cambodiana was confirmed as 0.38%, which was lower than the threshold for a highly heterozygous genome.
Compared with other plant genomic information and on the basis of the GC content, heterozygosity, repetitive element content, and genome size described earlier, large-insert libraries of genomic DNA and high sequencing depth were appropriate for the whole-genome sequencing of D. cambodiana.The final genome might also be assembled under a higher kmer value than that used in this study [64,68].Not only did the genome sequence survey technology provide strategies for whole-genome assembly in future projects [69], but more importantly, partial nucleic acid and protein information was simultaneously obtained via the assembly and annotation of raw reads from the genome sequence survey [53].Such an approach could also provide more genome-level genetic information for D. cambodiana without a complete genome sequence and presumably improve the understanding of the connection between defense response and dragon's blood formation in D. cambodiana.
In nature, only Dracaena trees 30-50 years of age can produce a small amount of dragon's blood [18].Our previous studies revealed a chemical inducer that can induce the formation of red resin in young D. cambodiana [18,19].This special inducer, which contains 37.5 g/L NaCl and 1.25 ml/L acetic acid, can quickly induce the formation of the major constituents of dragon's blood in the stems of 3-year-old D. cambodiana; in particular, flavonoids can be detected by HPLC analysis at 3 days, 6 days and 9 days after injecting the inducer.After inducer injection treatment for 6-12 months, qualified red resin can be collected from the stems of D. cambodiana.Previous studies have also reported that red resin production can be induced in healthy stems by the pathogenic microbes, such as Fusarium graminearum and Fusarium proliferatum, that were isolated from a dragon's blood-secreting stem of D. cochinchinensis [4,14].Plant defense responses may be involved in dragon's blood formation in Dracaena species, based on the phenomena of wounding, induction, microbial infection, and red resin formation.
A common feature of microbial infection in plants is passing though the plant cell wall, which is the first barrier of plants against pathogen attack [70].To this end, the main structural components of plant cell walls, such as pectin, cellulose, chitin, and other polysaccharides, are depolymerized with special enzymes secreted by the microbe [71].In this process, the plant genes encoding the enzymes to degrade these components can also be regulated by the pathogen [72].Subsequent studies have found that such genes are a part of the plant immune system and can be regulated by various stresses [73].The gene expression profiles examined in this study indicated that a chemical inducer can regulate genes encoding galactosidase, cellulase, chitinase, pectin esterase, and lyase in D. cambodiana (Fig 6A ).
All plant stress reactions can produce ROS, which are an important signaling factor in plants and may link to systemic acquired resistance, programmed cell death, and plant hormone signaling [74][75][76].However, the presence of ROS can further injure healthy plant cells.Hence, the antioxidant system is triggered during the ROS burst to protect normal cells from superfluous ROS damage.The antioxidant system commonly includes such antioxidases as SOD, POD, and GST [77][78][79][80].Here, the expression patterns of APX, GR, GST, POD, permease, and P450s were upregulated during dragon's blood formation in D. cambodiana.Furthermore, some genes expressed during dragon's blood biosynthesis encoded natural antioxidants [81, Plant hormones play important roles in regulating plant development and defense response.Genes related to plant development and defense response can be regulated by pathogens, insects, wounding, exogenous JA, SA, benzothiadiazole (BTH), ABA, NaCl, and other biotic and abiotic stresses [16,17,[83][84][85].In this study, the expression of JA-related genes (LOX, AOS, AOC, and OPAR), SA-related genes (PAL, PBS3, ICS, and EPS1), and osmosisrelated genes (PDG and TPS) in the stems of D. cambodiana was enhanced by a chemical inducer (Fig 6C).Previous studies have indicated that plant hormones, which are essential for plant response to biotic and abiotic stresses, can modulate secondary metabolite accumulation in plants [16,84,86].Consistent with previous reports [18,19], this study further demonstrated the potential connection between the defense response and dragon's blood formation in D. cambodiana.

Conclusions
This study is the first to report the genomic characterization of Dracaena on a genome-wide scale.Of the 50 Dracaena species, D. cambodiana is one of the most important in terms of its horticultural and medicinal values in China and Southeast Asia.However, limited genetic information has impeded studies of D. cambodiana, especially the mechanism of dragon's blood formation.The regulatory expression analysis of candidate genes involved in the plant defense response may help elucidate this mechanism in D. cambodiana.The 53,700 total genes derived from our assembly may facilitate genetic and genomic studies.The characterization of this genetic information may provide fundamental parameters for the sequencing and assembly strategies of the D. cambodiana genome program.

Fig 1 .
Fig 1. K-mer (k = 21) analysis for estimating the size of the D. cambodiana genome.The occurrence of 21-mers was calculated using Jellyfish version 2.1.3,based on the sequencing data from three short-insert libraries of D. cambodiana.The genome size was estimated by the following formula: Genome size = K-mer num/Peak depth.The subpeak on the left of the main peak was caused by genome heterozygosity.https://doi.org/10.1371/journal.pone.0209258.g001

Fig 3 .
Fig 3. Percentages of various dinucleotide and trinucleotide repeat motifs in the D. cambodiana genome.A: Percentage of various dinucleotide repeat motifs in the D. cambodiana genome; B: Percentage of various trinucleotide repeat motifs in the D. cambodiana genome.https://doi.org/10.1371/journal.pone.0209258.g003

Fig 5 .Fig 6 .
Fig 5.The number of gene clusters in D. cambodiana and other species.The first number under the plant species name is the total number of putative genes used for clustering.The second number under the plant species name is the number of clusters or families.https://doi.org/10.1371/journal.pone.0209258.g005