Transcriptomic analysis elucidates the molecular processes associated with hydrogen peroxide-induced diapause termination in Artemia-encysted embryos

Treatment with hydrogen peroxide (H2O2) raises the hatching rate through the development and diapause termination of Artemia cysts. To comprehend the upstream genetic regulation of diapause termination activated by exterior H2O2 elements, an Illumina RNA-seq analysis was performed to recognize and assess comparative transcript amounts to explore the genetic regulation of H2O2 in starting the diapause termination of cysts in Artemia salina. We examined three groupings treated with no H2O2 (control), 180 μM H2O2 (low) and 1800 μM H2O2 (high). The results showed a total of 114,057 unigenes were identified, 41.22% of which were functionally annotated in at least one particular database. When compared to control group, 34 and 98 differentially expressed genes (DEGs) were upregulated in 180 μM and 1800 μM H2O2 treatments, respectively. On the other hand, 162 and 30 DEGs were downregulated in the 180 μM and 1800 μM H2O2 treatments, respectively. Cluster analysis of DEGs demonstrated significant patterns among these types of 3 groups. GO and KEGG enrichment analysis showed the DEGs involved in the regulation of blood coagulation (GO: 0030193; GO: 0050818), regulation of wound healing (GO:0061041), regulation of hemostasis (GO: 1900046), antigen processing and presentation (KO04612), the Hippo signaling pathway (KO04391), as well as the MAPK signaling pathway (KO04010). This research helped to define the diapause-related transcriptomes of Artemia cysts using RNA-seq technology, which might fill up a gap in the prevailing body of knowledge.


Introduction
During Artemia diapause, the metabolic process of egg cells reduces [1], thereby decreasing energy consumption; furthermore, molecular chaperones are accustomed to prevent or decrease protein denaturation. Brine shrimp possess a solid, semipermeable outer shell that protects their embryos while making sure embryo contact with external circumstances. Brine shrimp contain trehalose, which exhibits a fluid retention capacity that protects embryos from lethal harm due to dehydration [2,3]. Diapause termination of Artemia cysts was proven to be regulated by environmental elements. Robbins et al. [4] indicated that the hatching rate of Artemia cysts improved by 50% when subjected to 180 μM hydrogen peroxide (H 2 O 2 ). Based on the experimental outcomes obtained by these studies, chemical methods may be employed to terminate cyst diapause. However, the mechanism of diapause termination has not been elucidated and therefore is a subject that warrants in-depth investigation.
Several studies on cyst diapause have been around in arthropods utilizing a transcriptomic approach to elucidate the gene expression involved with diapause or diapause termination. For examples, transcriptomic profiling uncovered significant changes in gene expressions related to gonad development, lipid synthesis, and molt regulation, suggestive of a coupling of the processes in diapause initiation in copepod Calanus finmarchicus [5][6][7]. Tu et al. [8] used the proteomic and transcriptomic tools to investigate the distinctions between diapause and non-diapause eggs at both protein and mRNA expression levels. These differentially expressed proteins and genes were associated with metabolism and cryoprotection.
Researches on Artemia species have exposed to exterior biotic or abiotic elements to ensure that they provide a basic background data of Artemia transcriptomes, these researches demonstrate that the power of RNA-seq to recognize molecules in Artemia, however these studies were focusing on the adult individuals [9,10]. In this research, we uncovered brine shrimp to H 2 O 2 to induce their diapause termination. Subsequently, RNA-seq was used| to investigate the transcriptional profiles of H 2 O 2 -induced diapause termination of Artemia cysts to characterize the physiological regulation of the diapause termination process at the transcriptomic level.

Artemia cyst samples
The modified procedure of the acquisition of Artemia salina diapause cysts was according to the method provided by Robbins et al. [4]. The cysts were incubated in artificial seawater in a 15 cm Petri dish for 7 days. The shell residuals and hatched individuals were removed each day. After 7-days incubation, the unhatched-cysts were used and harvested for diapause. The diapausing cysts were then incubated in a stationary condition in a Petri dish (15 cm) with H 2 O 2 at 0, 180 and 1800 μM for 5 h.

RNA quantification and qualification
RNA qualification and quantification were determined by electrophoresis analysis at 1% agarose gels to monitor whether its degradation and contamination. RNA purity was measured by utilizing NanoPhotometer1 spectrophotometer (IMPLEN, CA, USA). RNA concentration was determined using a Qubit1 RNA Assay Kit in Qubit1 2.0 Fluorometer (Life Technologies, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA).

Library preparation for transcriptome sequencing
From the RNA sample to the ultimate data, each step, including sample testing, library collecting, and sequencing, influences the quality of the data which impacts the analysis consequences. To ensure the reliability of the data, quality control (QC) is conducted at each step in this research. The workflow is shown in S1A Fig. A total amount of 1.5 μg RNA per sample was used for the RNA sample preparations. Sequencing libraries were produced using the NEBNext1 Ultra™ RNA Library Prep Kit for Illumina1 (NEB, USA) following manufacturer's guidelines, and index codes were put into attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was completed using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase. DNA polymerase I and RNase H were applied to synthesize the Second strand cDNA synthesis subsequently. The rest of the overhangs were changed into blunt ends via exonuclease/polymerase actions. After adenylation of the 3' ends of DNA fragments, NEBNext Adaptors with hairpin loop structures were ligated for hybridization. To obtain the specific cDNA fragments of 150~200 bp in length, the library fragments were purified with the AMPure XP system (Beckman Coulter, Beverly, USA). Then, 3 μl USER Enzyme (NEB, USA) was used in combination with size-chosen, adaptor-ligated cDNA at 37˚C for 15 min accompanied by 5 min at 95˚C before PCR. After that, PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. Finally, PCR products were purified (AMPure XP system), and library quality was assessed on the Agilent Bioanalyzer 2100 system. Raw cDNA sequences have been deposited in Sequence Read Archive (SRA) at NCBI BioProject resources. The following accession numbers was PRJNA664008.

Data quality control
Raw data (raw reads) in fastq file format were processed with in-house Perl scripts. Clean data (clean reads) were obtained by dethatching reads that contain adaptors, poly-N and low-quality reads from raw data in this step. Simultaneously, the Q20, Q30, GC content and sequence duplication level of the clean reads were calculated. All downstream analyses were based on clean data with high quality.

Transcriptome assembly
Due to the lack of a reference genome for Artemia samples, clean reads have to be assembled to obtain a reference sequence for the next analysis (S1B Fig). The software Trinity [11] was applied to complete the transcriptome reconstruction. The left files (read1 files) from all libraries/samples were pooled into one left.fq file, and another right files (read 2 files) were pooled into one right.fq file. Transcriptome assembly was achieved based on the left and right fq. using Trinity transcriptome assembler with min/kmer/cov set to 2 by default, and all the parameters were at default configurations.
The workflow of Trinity was described followed: Inchworm: Constructs a k-mer dictionary from all sequenced reads (k = 25), selects the most frequent seeding k-mer in the dictionary and extends the seed in both directions to create a contig assembly. Chrysalis: Chrysalis clusters minimally overlapping Inchworm contigs into sets of connected elements and constructs comprehensive de Bruijn graphs for each element. Each element defines an assortment of Inchworm contigs that will tend to be derived from alternative splice forms or closely related gene paralogs. Butterfly: Butterfly reconstructs full-length, plausible, linear transcripts by reconciling the individual de Bruijn graphs produced by Chrysalis with the original sequence reads and paired ends. Butterfly reconstructs unique transcripts for splice isoforms and paralogous genes and resolves ambiguities stemming from sequences >k bases length or from errors that are shared among transcripts. The final assembled result named as TRINITY.fasta.
Gene function was annotated according to the information in following databases: Nt (NCBI non-redundant nucleotide sequences),Nr (NCBI non-redundant protein sequences), KOG/ COG (Clusters of Orthologous Groups of proteins), Pfam (Protein family), Swiss-Prot (A manually annotated and reviewed protein sequence database), GO (Gene Ontology) and KO (KEGG Ortholog database).
The provided information and characteristics of the seven databases are specific and following described as: Nr (NCBI non-redundant protein sequences): the formal protein sequence database of NCBI, which include protein sequence information from such databases as Gen-Bank, Swiss-Prot, PIR (Protein Information Resource), PRF (Protein Research Foundation), and PDB (Protein Data Bank).Pfam (Protein family) is the most comprehensive collection of protein families and domains, represented as multiple sequence alignments and profile by hidden Markov model. Pfam families are classified into two categories, Pfam-A and Pfam-B, based on the reliability of annotations. Both KOG (euKaryotic Orthologous Groups) and COG (Cluster of Orthologous Groups of proteins) are derive from NCBI's gene orthologous relationships. COG is specific to prokaryotes, while KOG is definitely specific to eukaryotes. Based on the evolutionary relationships among species, COG/KOG defined the homologous genes from different species into different ortholog clusters. Swiss-Prot was an annotated and reviewed high-quality protein sequence database which combines computed features, experimental results, and scientific conclusions. And KEGG (Kyoto Encyclopedia of Genes and Genome) is a resource for understanding the high-level functions and utilities database of each biological system.GO (Gene Ontology) is the established standard for the functional annotation of gene products and is used to classify the three following functional attributes of gene: BP: biological process, MF: molecular function, and CC: cellular component.

Quantification of gene expression levels
The degree of gene expression was estimated by RSEM [12]; clean data had been mapped onto the assembled transcriptome, and the read count for each gene was calculated from the mapping consequences.

Differential expression analysis
The value of read count obtained from the gene expression analysis was used for the initial data to carry out further differential expression analysis. Group differential expression analysis was conducted by using the DESeq R package (1.10.1) [13]. DESeq provides statistical routines, in which the one of the models based on the negative binomial distribution was used to determining the differential expression with digital gene expression in this study. For controlling the false discovery rate obtained from digital gene expression analysis, the P value of DESeq analysis was then adjusted by using Benjamini and Hochberg's approach. Genes with an adjusted p-value| <0.05 was considered as differentially expression.

Clustering and sequencing (Novogene Experimental Department)
Corset [14] functions by clustering contigs based on separates contigs and shared reads when different expression patterns observed between samples. Corset also filter out contigs with a low number of mapped reads with the information from reads (less than 10 reads). The clustering of the index-coded samples was performed on a cBot Cluster Generation System using the Illumina HiSeq PE Cluster Kit (cBot-HS) based on the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq platform, and generated the paired-end reads.

Gene enrichment analysis
GO enrichment analysis of the DEGs was applied on the GOseq R package-based Wallenius noncentral hypergeometric distribution [15], which can adjust the bias of gene length in DEGs.
KEGG [16] is a database for understanding the high-level functions and utilities of genes in biological systems from molecular-level information. These genes could be collected from cells, organisms and ecosystems, especially large-scale molecular datasets generated by genome sequencing and high-throughput experimental technologies (http://www.genome.jp/kegg/) could provide more reliable evidence. Software KOBAS [17] was used to evaluated the statistical enrichment of differentially expressed genes in KEGG pathways. The software and parameters used in this study for data analysis are listed in S1 Table.

Increase of hatching rates after H 2 O 2 treatments
After 7-d incubation of Artemia cysts, the un-hatched ones were collected and cultured 12 hr for identification of H 2 O 2 treatments on the hating rates. The results showed that the hatching rate was 3.1 ± 0.90% in control group (0 μM H 2 O 2 ); 47.5 ± 5.31% in low-H 2 O 2 group (180 μM), and 4.4 ± 1.30% in high-H 2 O 2 group (1800 μM). Robbins et al. [4] and Hong et al. [9] have demonstrated that H 2 O 2 increases the hatching rates of Artemia cysts and the hatching rates decreased in higher H 2 O 2 than in lower H 2 O 2 treatments. The pattern of H 2 O 2 treatments on hatching rates is similar to our study.

Gene expression analysis
Transcriptomes of diapause cysts with or without H 2 O 2 -treated were sequenced individually, in which 26,252,846-49,563,734 raw reads were generated. The reads of adaptors and low quality were then removed by filtering process, so that there could have a clean reads for downstream analyses. The filtering process includes several steps: (1) to eliminate the reads with adaptor contamination. (2) to exclude the reads when uncertain nucleotides composition more than 10% of either reads (N > 10%). (3) to remove reads when low-quality nucleotides (base quality less than 20) constitute more than 50% of the total read. After filtering, 25,645,468-48,301,960 clean reads were obtained. The results of Kruskal-Wallis test revealed that the amounts of clean reads among the cysts with or without treating H 2 O 2 weren't considerably different (p = 0.2881). Therefore, the regulation of H 2 O 2 -induced diapause termination of Artemia cysts was related to the gene expression level rather than the amount of expressed genes. The de novo transcriptome filtered by Corset was used as a reference alignment. RSEM could map reads again to the transcriptome and quantify the expression level. S2 Table showed the mapping results. Both FPKM and RPKM used for gene expression level normalization. FPKM considered the fragment difference between these two reads of paired end reads, while RPKM considers reads: rather than using read counts, it approximates the relative abundance of transcripts regards to fragments observed from an RNA-seq experiment. That might not be represented by a single read, such as in paired-end RNA-seq experiments. In bioinformatic analysis without genome references, such as Artemia, the gene with FPKM>0.3 is considered to being expressed. This threshold is recommended and accepted by several prestigious journal and had been considered to be closely represent the high-level gene expression.
Clean reads were assembled by de novo method via Trinity to obtain the assembly transcriptome. To remove redundancies, the hierarchical clustering was carried out to the corset. Afterwards, the longest transcripts of each cluster were selected for unigenes. There were 114,153 transcripts and 114,057 unigenes were identified in the cyst samples (Table 1). With the priorities of genomic or transcriptomic techniques, recent researches had emphasized indepth studies at the transcriptome level in Artemia organisms [9,10,[18][19][20]. It was supposed that the genes involved with development or diapause termination might be correlated with the systematic changes| in each cell. Therefore, it could be the proof that there were more expressed genes in reactivated Artemia embryos in this study. For completeness of the Artemia transcriptome, the results provide valuable and supplementary information which could complete the presented library nowadays.

Gene functional annotation
To achieve comprehensive gene functional annotation, seven databases were applied by Novogene. These databases were utilized for functional annotation of identified unigenes, and the statistics of successfully annotated genes in this research by the database are shown in Fig 1  and S3 Table. Of the 114,057 unigenes, the number of genes successfully annotated in at least one database against NT, NR, KO, Swiss Prot, PFAM, KOG and GO was 47,020 (41.22%). The summary of gene annotations based on each database is listed in S3 Table. The functional annotation of these genes according to KOG, KEGG and GO classifications is demonstrated in Fig 2. In GO, the most top five annotated genes were linked to 1) cellular process, 2) binding, 3) metabolic process, 4) single-organism process, and 5) catalytic activity. Those in KOG were related linked to 1) general function mechanism only, 2) signal transduction mechanisms, 3) Q20: percentages of bases whose correct base recognition rates are greater than 99% in total bases (7) Q30: percentages of bases whose correct base recognition rates are greater than 99.9% in total bases (8) GC content: percentages of G and C in total bases https://doi.org/10.1371/journal.pone.0247160.t001 posttranslational modification, protein turnover, chaperones, 4) translation, and 5) ribosomal structure and biogenesis. In the KEGG pathway, these genes were involved with 1) signal transduction, 2) translation, 3) the endocrine system, 4) transport and catabolism, and 5) proteins folding, sorting and degradation. The patterns of GO, KOG and KEGG, classifications were consistent with those of the A. sinica transcriptome [10], except for the amount of expressed genes. Thus, the additional genes recognized from this study provided a supplementary draft for Artemia transcriptomes.

Differential gene expression related to H 2 O 2 -induced diapause termination
Volcano plots were used to infer the overall distribution of differentially expressed genes  Table 2.
Cluster analysis was used to discover genes with similar expression patterns under our experimental conditions (S2 Fig). By clustering genes with comparable expression patterns, it may be possible to discern unknown functions of previously characterized genes or the features of unknown genes. In hierarchical clustering, areas of different colors denote different groups (clusters) of genes, and each group may have similar functions or take part in the same biological process. It showed different patterns among these three groups significantly. GO and KEGG enrichment analyses were further utilized to gather genes into biological functions or pathways through bioinformatic databases and statistical tools. The GO enrichment analysis helped to elucidate the gene function of differentially expressed genes, while the KEGG enrichment analysis also helped to elucidate the metabolic pathways where the genes were involved.
Compared to the control group, the annotations of up-and downregulated DEGs in H 2 O 2 treatments by GO and KEGG enrichment analysis are listed in Tables 3 and 4. GO enrichment analysis of DEGs showed that the biological functions of DEGs were involved in the regulation of blood coagulation, coagulation, wound healing and hemostasis ( Table 3). The interactions of multiple genes could be involved in certain biological functions. As KEGG is a collection of manually curated databases regarding genomes, biological pathways, diseases, drugs, and chemical substances, KEGG was a powerful tool in utilized for pathway enrichment analysis that identified significantly enriched metabolic pathways or signal transduction pathways associated with differentially expressed genes compared with the control group (Table 4 and Fig 4). The degree of KEGG enrichment was measured by the Rich factor that refers to the ratio of the amount of DEGs in the pathway and the amount of all genes annotated in the pathway. The Q-value is the p-value after normalization, and its range is In previous studies, most of them on genes related to the regulation of diapause termination in Artemia cysts focused on genes related to adversity resistance [9,[21][22][23][24]. For examples, these reported genes or proteins are involved in the regulation of diapause termination have been elucidated including HSP70 [25], HSP p26 [26,27], ArHSPs [28][29][30][31][32] and heat shock factor 1 (Hsf1) [33,34]. In addition, There are also many recent studies on late embryogenesis abundant (LEA) proteins in their roles on Artemia [35][36][37][38][39][40][41][42]. The late embryogenesis abundant (LEA) proteins in plants, bacteria and invertebrates can prevent organisms from protein aggregation caused by low temperature-related dryness or osmotic pressure, and then LEA proteins prevent damage caused by drying, cold or high salinity of a variety of organisms. Thus, diapause termination and adversity resistance have many related mechanisms in regulation.    These previous studies provide us with a protective mechanism against adversity to explore the regulation of diapause termination. NGS was used to study the gene expressions of Artemia under high salinity, and it was found that high salinity would increase the influence of cell cycle arrest through La-related protein and many genes related to lipid transporters [9]. The DEGs involved in binding and catalytic activity were observed to be upregulated during initial development in Artemia cysts [43]. This finding is similar to the results obtained in this research in reactivated diapaused cysts, in which the physiological process for development or diapause termination is close to binding and catalysis. In our previous study, with a metabolomic approach to elucidate the metabolic regulation of H 2 O 2 -induced diapause termination. MetaboAnalyst pathway analysis showed the pathway impact of metabolites in groups displaying amino acid degradation and biosynthesis and aminoacyl-tRNA biosynthesis [44]. Among them, aminoacyl-tRNA biosynthesis that was also identified and up-regulated in H 2 O 2 (1) Gene Ontology entry.
(3) GO types, including cellular component, biological process and molecular function.
(4) p-value in hypergenometric test.   treatments in this study at transcriptional level. Additionally, we also discovered the other regulation pathways involved in H 2 O 2 treatments which were the first findings in diapause termination of Artmeia cysts. In this study, we analyzed global gene expressions in Artemia cysts in the use of H2O2 to induce diapause termination. These findings were different from the previous studies that focused on genes or proteins involved in the regulation or responses against environmental stress. We found that these DEGs are involved in coagulation regulation and wounds. healing, hemostasis, antigen processing and presentation, Hippo signaling pathway and MAPK signaling pathway. Therefore, this work shows that diapause termination of Artemia cysts is regulated by other pathways which are important on known or novel mechanisms in diapause termination.

Conclusions
In previous studies, most of the diapause termination focused on the regulation related to antistress responses, however, in this study, apply with a RNA-seq approach to discover upstream genetic regulation mechanisms that have not been described in Artemia cysts. The novel findings show a lot of genes and pathways which provide a new view to diapause termination of Artemia cysts.