De novo assembly, annotation, marker discovery, and genetic diversity of the Stipa breviflora Griseb. (Poaceae) response to grazing

Grassland is one of the most widely-distributed ecosystems on Earth and provides a variety of ecosystem services. Grasslands, however, currently suffer from severe degradation induced by human activities, overgrazing pressure and climate change. In the present study, we explored the transcriptome response of Stipa breviflora, a dominant species in the desert steppe, to grazing through transcriptome sequencing, the development of simple sequence repeat (SSR) markers, and analysis of genetic diversity. De novo assembly produced 111,018 unigenes, of which 88,164 (79.41%) unigenes were annotated. A total of 686 unigenes showed significantly different expression under grazing, including 304 and 382 that were upregulated and downregulated, respectively. These differentially expressed genes (DEGs) were significantly enriched in the “alpha-linolenic acid metabolism” and “plant-pathogen interaction” pathways. Based on transcriptome sequencing data, we developed eight SSR molecular markers and investigated the genetic diversity of S. breviflora in grazed and ungrazed sites. We found that a relatively high level of S. breviflora genetic diversity occurred under grazing. The findings of genes that improve resistance to grazing are helpful for the restoration, conservation, and management of desert steppe.


Introduction
Chinese grasslands are diverse and constitute the third largest grassland area worldwide, covering 41.7% of the country's territory and stretching from northern China to the Qinghai-Tibetan Plateau [1][2][3]. These grasslands are currently in danger of degradation owing to climate change and anthropogenic activities, and over 33% of the degradation is due to overgrazing [4,5]. Long-term overgrazing impacts species richness, community biomass, and soil quality [6][7][8]. However, plants can adapt to grazing pressure by developing certain characteristics, such as small sizes, fast growth, short life-spans, and low palatability, to avoid livestock

Collecting plant materials
The study area is in Wuchuan, Inner Mongolia Autonomous Region, China (41˚47 0 -41˚23 0 N, 110˚31 0 -111˚53 0 E) at an elevation of 1500-2000 m above sea level. This area belongs to the monsoon climate of medium latitudes. The mean annual precipitation and temperature are 354.1 mm and 2.6˚C, respectively. We observed three ungrazed sites (codes 1, 2, and 3) and three grazed sites (codes 4, 5, and 6) ( Table 1). The ungrazed sites have been fenced to prevent grazing for many years. In contrast, the grazed sites have been subjected to long-term heavy grazing. The area of each site is more than 1 hm 2 . These sites are distributed adjacently and are approximately around 2-10 km apart, indicating that they are under similar backgrounds of temperature and precipitation. The study was carried out in public land. No specific permissions are required for collecting samples, and no endangered or protected species are involved in study area.
In June 2016, we collected 33 samples at each site, with 3 samples for RNA-seq and 30 samples for genetic diversity analysis. For RNA-seq, we chose healthy and intact individuals in ungrazed sites, and chose individuals in obviously grazed patches in grazed sites. Samples for RNA-seq were rinsed with RNase-free dd water, mixed with equal amounts, stored in dry ice, and sent to the Beijing Genomics Institute (BGI, Shenzhen, China) for total RNA extraction and high-throughput sequencing. For genetic diversity analysis, we chose the healthy and intact individuals that were at least 10 m apart in both grazed and ungrazed populations. All samples were immediately frozen in liquid nitrogen and brought back to the Genetics Lab and stored at -80˚C for genetic diversity analysis.

RNA preparation and cDNA library construction
The total RNA of each sample was extracted using Trizol reagent (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's protocol. RNA integrity was examined using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and poly (A) mRNA was isolated using Oligo (dT) beads. The mRNA fragments were used as templates for reverse transcription to obtain cDNA. After fragment ends were repaired, polymerase chain reaction (PCR) amplification was performed, followed by poly (A) ligation and fragment length selection to construct a cDNA library. To obtain a high-quality library, an Agilent 2100 Bioanalyzer and ABI StepOnePlus real-time PCR system were used for quality control analysis.

RNA-seq and de novo assembly
The libraries were sequenced using the Illumina HiSeq 4000 platform (Illumina, USA). The raw reads generated by sequencing were filtered by eliminating adaptor sequences and lowquality reads and then assembled by the Trinity program with the default parameters [36]. The longest transcripts of each gene were chosen as unigenes.

Differentially Expressed Gene (DEG) identification
The expression levels of each gene were calculated by RSEM [41]. Differential expression analysis of genes in grazed and ungrazed samples was performed using the NOIseq approach [42]. We set fold change � 2.00 and probability � 0.8 as thresholds to identify the genes that were expressed at a significantly different level. MISA [43] and Primer3 [44] were used to detect SSRs from the transcriptome and design primers, respectively. We selected 21 SSR molecular markers derived from DEGs to screen grazing-related SSRs and then determined the optimal annealing temperature by PCR amplification (S1 Table). Total genomic DNA was extracted using the TIANGEN plant genomic DNA kit (Tiangen Biotech, Beijing, China) for all samples following the manufacturer's instructions. The quality and quantity of DNA were determined by using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The PCR amplification reaction mixture (25 μL) contained 1 μL template DNA (30-40 ng μL-1), 0.5 μL (10 pM) of each primer, 12.5 μL Premix Taq (TaKaRa Biotechnology Co., Dalian, Liaoning Province, China), and 10.5 μL double distilled water (ddH2O). The PCR amplification parameters were as follows: 4 min at 94˚C; 35 cycles of 30 s at 94˚C, 30 s at a primer-specific annealing temperature, 30 s at 72˚C and a final extension step at 72˚C for 10 min. PCR products were then separated using capillary electrophoresis and genotyped using an ABI 3730 DNA analyzer with a GeneScan 500 LIZ size standard (Applied Biosystems, Beijing, China). Finally, we screened eight primers with polymorphic loci ( Table 2). Polymorphism information content (PIC) values of markers were calculated using Cervus 3.0.7 [45].

Genetic diversity analysis
We used the eight abovementioned polymorphic SSR markers to analyze genetic diversity among grazed and ungrazed samples. First, GeneMarker version 2.6.0 (SoftGenetics, State College, PA, USA) was used to perform peak identification and fragment sizing. Then, we applied GenAlEx 6.5 [46] to convert the data format and to calculate the genetic parameters, including the observed number of alleles (Na), effective number of alleles (Ne), expected heterozygosity (He) and observed heterozygosity (Ho), gene flow (Nm), and Shannon's information index (I). R version 3.6.3 was used to perform Student's test to explore the relations of genetic parameters between grazed and ungrazed populations.
We matched unigenes to the GO database to clarify the cellular function of the gene products. We found that 50,089 unigenes were matched to more than 50 functional items, which were divided into three main groups: biological process, cellular component, and molecular function (Fig 2). The largest subcategory of biological process was "metabolic process" (26,800), followed by "cellular process" (26,513). Of the cellular component, "cell" and "cell parts" showed the most matches with unigenes at 26,939 and 26,833, respectively. Among the molecular function category, "binding" and "catalytic activity" showed the most matches at 24,708 and 22,732, respectively.
With the aim of validating protein functions, the COG database was used to compare with unigenes. A total of 37,880 unigenes were annotated into 25 COG functional categories ( Fig  3). The "general function prediction only" was the cluster with the most unigenes (10,786), followed by "transcription" (7184), "translation, ribosomal structure and biogenesis" (7048), "cell cycle control" (6488), and "function unknown" (6314), whereas only a few unigenes were assigned to "extracellular structures" and "nuclear structure".
KEGG annotation was performed to identify the metabolic pathway to which unigenes belonged, and 63,044 unigenes had matches in the KEGG database. These unigenes were assigned to 133 pathways.

Differential expression and pathway analysis
Using the NOI approach, we identified 686 DEGs, including upregulated (304 DEGs) and downregulated (382 DEGs). Then, a similarity search of the DEGs in the KEGG database was

PLOS ONE
Transcriptome response and genetic diversity of S. breviflora to grazing performed to explore the pathways associated with a significant enrichment in response to grazing. We found that these DEGs were sorted into 100 pathways. As presented in Table 5, the main enriched pathways included "alpha-linolenic acid metabolism" and "plant-pathogen interaction".

Development and validation of grazing-related SSR markers
MISA was used to mine SSR markers from transcriptomes using the following criteria: mononucleotides with at least 12 repeats, di-nucleotides with 6 repeats, tri-and tetra-nucleotides with 5 repeats, and penta-and hexa-nucleotides with 4 repeats. We examined 111,018 sequences and identified 19,607 SSRs (S2 Table). Additionally, 11,963 tri-nucleotide SSRs and 386 quad-nucleotide SSRs accounted for 61.01% and 1.97% of all the SSRs, respectively, representing the most and the least of all SSR markers. The CCG/CGG trinucleotide repeats were the most abundant motifs detected in SSRs (5186, 26.45%), followed by the AG/CT (3117, 15.90%), AGG/CCT (2611, 13.32%), and AGC/CTG (1716, 8.75%) motifs ( Table 6). The frequency of the remaining motifs accounted for 35.58% of the SSRs.
To develop grazing-related molecular markers, we preliminarily selected 21 DEGs and designed primers (S1 Table). After screening polymorphic SSRs, excluding primers with product sizes that did not meet the requirements and could not generate PCR products, eight grazing-related genes with SSR polymorphisms were found. In this study, PIC values ranged from 0.14 to 0.47, averaging 0.31, which indicated that these markers were informative and effective for genetic analysis ( Table 2).

Effects of grazing on genetic diversity
The eight primers yielded 29 alleles for 180 samples from the six sites. Among these primers, CL966.Contig12 and CL.5285.Contig5 produced the largest number of alleles, i.e. five alleles.

PLOS ONE
The fewest numbers of alleles were exhibited in CL1837.Contig15 and Unigene12747 with two (Table 7). On average, there were four alleles per primer. Shannon's information index of the primers ranged from 0.31 to 0.92, with a mean of 0.62.
The genetic diversity of S. breviflora can be calculated using Shannon's information index (I) for each population. As Table 8 indicates, the grazed populations showed the highest Shannon's information index value, with a value of 0.64 (pop4 and pop6). In contrast, the ungrazed populations showed the lowest level of genetic diversity, with a value of 0.44 (pop2). The observed numbers of alleles (Na) ranged from 2.13 (pop2) to 3.00 (pop4), with an average of 2.71. The effective number of alleles (Ne) varied from 1.47 (pop2) to 1.71 (pop4), with an average of 1.62. Compared with the mean observed heterozygosity of 0.04 (Ho), the mean expected heterozygosity (He) value was high, with a value of 0.34. This result indicated that the inbreeding rate of S. breviflora is high, and the proportion of heterozygotes is relatively low. As shown in Table 9, both Shannon's information index (I) and expected heterozygosity (He) in grazed populations are higher significantly than that in ungrazed populations (p<0.05).

PLOS ONE
Transcriptome response and genetic diversity of S. breviflora to grazing

Discussion
NGS strategies and their application to transcriptomics can be used to meet the increasing demands of acquiring accurate genome information. To clarify the transcriptome responses of S. breviflora to grazing, we compared the transcriptome changes under grazing and nongrazing conditions and found that 686 unigenes showed differential expression. These DEGs were significantly enriched in the "alpha-linolenic acid and metabolism" and "plant-pathogen interaction" pathways, which may enhance the adaptability of S. breviflora to grazing. Based on DEGs, we screened eight grazing-related SSR markers to explore the genetic diversity of S. breviflora and found that grazing has a positive effect on genetic diversity. Changes in gene expression levels help S. breviflora defend against grazing damage, and the increased genetic diversity provides the potential for its adaptation and evolution; all of these factors contribute to the survival of plants in different aspects.

Grazing stimulates the expression of wound-and defense-related genes
As a main effect of grazing on herbage, wounding threat is inevitable and occurs over the entire life [35]. In addition to developing adaptive traits such as low heights, small leaf sizes, low root biomass [47], few vegetative tillers, and short internode lengths [48], plants also modulate the expression levels of related genes, for instance, genes related to the jasmonic acid (JA) family [49], to defend themselves against grazing damage. For S. breviflora, genes such as phospholipases A 1 (PLA 1 ) and 13-lipoxygenase (13-LOX), which are known to be involved in the synthesis of JA, were upregulated. For other plants, leaf wounding response were also related to JA synthesis [50,51]. Singh et al. [52] elucidated the expression profile of rice PLA-encoding genes under salt, cold, and drought stress using microarray data and found that all the DEGs were upregulated. When tomato (Solanum lycopersicum L.) is exposed to methyl-jasmonate treatment and damaged, SILOX4, which belongs to the 13-LOX subfamily, is highly expressed

PLOS ONE
Transcriptome response and genetic diversity of S. breviflora to grazing [53]. PLA 1 and 13-LOX are shared by grazing and other wounding response in different species, indicating that plants have some common mechanisms in defense response. Herbivore grazing, however, is a joint process that includes wounding, defoliation, and saliva deposition. Chen et al. [54] used rice to study grazing-induced pathway, and found the expression level of 1,3-beta-glucosidase increased. This gene was reported to be associated with resistance in plant, cell wall construction and modification [55,56]. In our study, gene related to beta-glucosidase was upregulated under grazing, conferring resistance on S. breviflora. Grazing provides opportunities for pathogens to infect plants [57]. When plants are infected by pathogens, they develop disease resistance by activating mitogen-activated protein kinase (MAPK) cascades [58]. Specifically, MAPK kinase kinases (MAPKKKs) phosphorylate MAPK kinases (MAPKKs), which then phosphorylate MAPKs [59]. In the present study, MAPKK genes showed upregulated expression under grazing, suggesting that MAPKK genes

PLOS ONE
contribute to the defense against invasion by microorganisms. Gene overexpression in the MAPK cascade defense mechanism occurs widely in plants, such as rice [60], Arabidopsis [61], cotton [62], and wheat [63].

Grazing increases the genetic diversity of S. breviflora
Genetic diversity is a fundamental basis for plants to adapt to the environment, and to provide the potential for evolution. Genetic diversity is shaped by the balance between genetic drift, inbreeding, recombination, gene flow, mutation, and selection [64]. Grazing affects plant genetic diversity in different ways. Accompanied by trampling and livestock migration, grazing impacts genetic diversity of populations mainly through gene flow and genome-wide selection that changes the substitution rate of mutants [65].
Gene flow, one of the important forces leads to improving genetic diversity of natural plant populations, occurs by the spread of pollen, seeds, spores, and other genetic materials. S. breviflora is a facultatively selfing species. We found that pollen flow may promote sexual reproduction of S. breviflora. As Table 9 indicates, He in grazing populations are significantly higher than ungrazed populations. Therefore, pollen-mediated gene flow plays an important role in maintaining genetic diversity of S. breviflora populations while stressed by grazing [66]. In general, grazing affects plant reproduction by defoliation. S. breviflora, however, can complete life circle under heavy grazing by increasing reproductive investment, which brings about maintaining even increasing seed germination rate [67][68][69]. Grazing transfers seeds through livestock migration and fosters seedling establishment, resulting in the enhancement of seedmediated gene flow, and consequently increasing genetic diversity [66], though seed-mediated gene flow was not mentioned in the present study. Another important force driven genetic diversity changes is gene polymorphism. So, we deliberately mined SSRs derived from DEGs of S. breviflora. As a result, we found a higher genetic diversity in grazing populations. This result suggests that grazing promotes the polymorphism of genes with relevance to grazing. These polymorphisms may endue S. breviflora with more grazing tolerance.
Other factors such as marker systems and grazing intensity might be taken in to account. SSR system is more efficient for genetic analysis than the ISSR system [70]. We observed that gene flow of SSR primers in S. breviflora was higher than that of ISSR primers in S. grandis [25] and S. krylovii [24], which varied from 4.16 to 14.99, with a mean value of 8.33 (Table 7). In addition, grazing intensity has also been taken into account in other studies. Shan et al. [25] indicated that the genetic diversity of S. grandis was the highest under light grazing. Similar results were suggested for Elymus. nutans in the study by Ma et al. [26]. For S. krylovii [24] and S. purpurea [71], the highest value appeared in moderately grazed populations. Contradictory results, including a negative relationship [72] and no correlation [73], were reported in other

PLOS ONE
studies. These contradictory results imply the complexity of grazing-induced genetic diversity that are affected by grazing intensity, differences in species, geographic locations and molecular marker systems.

Conclusions
Overall, grazing activated plant defense-related genes and increased genetic diversity in S. breviflora. The desert steppe in Inner Mongolia has become severely degraded because of longterm overgrazing. The S. breviflora community, however, remains stable, and this species occur with S. bungeana, S. klemenzii, and S. krylovii in different types of steppes under diverse conditions of precipitation, temperature, and soil attributes. The reason for this may be explained in part by our findings, i.e., the enhancement of genetic diversity of S. breviflora under grazing improves the adaptability of the species to stressed environments. For maintaining genetic diversity of S. breviflora, a suitable grazing intensity is needed in order to implement rational use and conservation of desert steppe.
Supporting information S1