Mitogen activated protein kinase (MAPK)-regulated genes with predicted signal peptides function in the Glycine max defense response to the root pathogenic nematode Heterodera glycines

Glycine max has 32 mitogen activated protein kinases (MAPKs), nine of them exhibiting defense functions (defense MAPKs) to the plant parasitic nematode Heterodera glycines. RNA seq analyses of transgenic G. max lines overexpressing (OE) each defense MAPK has led to the identification of 309 genes that are increased in their relative transcript abundance by all 9 defense MAPKs. Here, 71 of those genes are shown to also have measurable amounts of transcript in H. glycines-induced nurse cells (syncytia) produced in the root that are undergoing a defense response. The 71 genes have been grouped into 7 types, based on their expression profile. Among the 71 genes are 8 putatively-secreted proteins that include a galactose mutarotase-like protein, pollen Ole e 1 allergen and extensin protein, endomembrane protein 70 protein, O-glycosyl hydrolase 17 protein, glycosyl hydrolase 32 protein, FASCICLIN-like arabinogalactan protein 17 precursor, secreted peroxidase and a pathogenesis-related thaumatin protein. Functional transgenic analyses of all 8 of these candidate defense genes that employ their overexpression and RNA interference (RNAi) demonstrate they have a role in defense. Overexpression experiments that increase the relative transcript abundance of the candidate defense gene reduces the ability that the plant parasitic nematode Heterodera glycines has in completing its life cycle while, in contrast, RNAi of these genes leads to an increase in parasitism. The results provide a genomic analysis of the importance of MAPK signaling in relation to the secretion apparatus during the defense process defense in the G. max-H. glycines pathosystem and identify additional targets for future studies.

Introduction Plants, like many organisms, respond to a number of biotic and abiotic challenges. These responses occur through complex signal transduction processes. Plants have a number of different signaling mechanisms that they use to respond to these challenges and included among them is the mitogen activated protein kinase (MAPK) cascade [1]. The MAPK cascade functions in eukaryotes by regulating many different cellular processes [2][3][4]. The MAPK cascade is a three-tiered signal transduction platform that is shared between eukaryotes [1]. It exists for cells to transduce input signals into a meaningful output [1]. Input information is processed through a stepwise series of phosphorylation events whereby MAP kinase kinase kinases (MAPKKKs) phosphorylate MAP kinase kinases (MAPKKs) that in turn phosphorylate MAPKs [5]. These events lead to an appropriate output response [5]. Due to this stepwise, shared processing, it has been determined that the MAPK cascade works as a cooperative enzyme, switching cells from one discrete state to another [6].
Many studies have shown the importance of the MAPK gene family functioning in plants during various processes including drought and cold stress as well as defense to pathogens [4,7]. The ubiquitous nature of MAPKs and their roles in signaling also means they function in important ways in agricultural plants and among the most important is Glycine max (soybean). However, G. max has many pathogens that affect agronomic production and among the most important is the plant parasitic nematode Heterodera glycines, (soybean cyst nematode [SCN]). H. glycines causes more agronomic loss than the rest of its pathogens combined losses, creating an urgent need to understand and mitigate its pathogenicity [8].
With regard to pathogen defense, the MAPK3/6 genes have been the most extensively studied in plants [9,10]. However, recent findings have expanded the breadth of MAPKs functioning in defense [11]. In those functional experiments performed in G. max, the entire 32 member MAPK gene family has been studied showing the G. max MAPK2, MAPK3-1, MAPK3-2, MAPK4-1, MAPK5-3, MAPK6-2, MAPK13-1, MAPK16-4 and MAPK20-2 have a defense role [11]. In a number of these cases, the MAPKs have been determined to be expressed within the H. glycines-induced feeding structure (syncytium) produced from a root pericycle cell that merges with 200-250 neighboring cells through the activities of the parasite. The syncytium is also the site of the localized defense response [11,12]. Syncytium expression has been observed for MAPK2, MAPK3-1, MAPK4-1, MAPK5-3 and MAPK6-2, MAPK16-4 and MAPK20-2 [11]. Additional analyses of the expression of already proven defense genes, using quantitative real-time PCR (qRT-PCR), determined that a number of the defense MAPKs regulate the relative transcript abundance of already proven defense genes. The results are consistent with prior results that have examined the relative transcript abundances of various defense genes in this pathosystem [13,14].
Within the G. max-H. glycines pathosystem, experiments have also demonstrated that the MAPKs function in ways that link or even converge in on pathogen associated molecular pattern (PAMP) triggered immunity (PTI) and effector-triggered immunity (ETI) branches of the defense response [11,15]. Furthermore, experiments have demonstrated that the bacterial effector harpin is capable of inducing the expression of some of these defense MAPKs, including MAPK3-1, MAPK3-2 and MAPK5-3 [11,16]. Experiments that have been presented in other plant pathosystems have shown that harpin is capable of engaging the transcription of the coiled-coil nucleotide binding leucine rich repeat (CC-NB-LRR) NON-RACE SPECIFIC DISEASE RESISTANCE 1 (NDR1)/HARPIN INDUCED1 (HIN1) which can transduce input signals through the MAPK pathway [16][17][18][19]. The same observations have been demonstrated in the G. max-H. glycines pathosystem [11,20].
Also performing a role within this genetic defense network that is present in the G. max-H. glycines pathosystem is a homolog of BOTRYTIS INDUCED KINASE1 (BIK1) [11,13,21]. The A. thaliana BIK1 functions as a receptor-like cytoplasmic kinase (RLCK) at a key branch point occurring between brassinosteroid ligand-facilitated growth signaling and defense signaling. This signaling is mediated by BRASSINOSTEROID INSENSITIVE 1 (BRI1) [22]. The defense branch of this signaling platform is mediated by several different microbe associated molecular pattern (MAMP) receptors. These receptors include FLAGELLIN SENSING 2 (FLS2) and EF-Tu RECEPTOR (EFR) as well as the Arabidopsis DAMP PEPTIDE 1 RECEPTOR (AtPEPR1). Each of these receptors are shared between BIK1 and BRI1-associated kinase 1 (BAK1) [21][22][23][24][25]. The experiments demonstrate the importance of BIK1 in mediating signals from a number of different effectors. Furthermore, the results are consistent with experiments that demonstrated G. max BIK1 is expressed specifically during a defense response within syncytia undergoing a resistant reaction and functions in defense [13,26]. Recent results have also shown that the overexpression of the G. max BIK1 leads to induced expression of MAPK3-1 and MAPK3-2 [11].
The goal of the analysis presented here is to identify genes whose expression is regulated by MAPKs in G. max that function in defense to H. glycines, possibly aiding in understanding the convergence of ETI and PTI processes [27]. Toward this goal, previously generated RNA seq data has been analyzed from the transgenic defense MAPK-overexpressing (MAP-K-OE) lines generated in G. max [28]. The analysis has resulted in the identification of a core set of 309 genes that are induced in common between the different MAPK-OE lines (referred to as MAPK-OE-all). Those induced genes then have been compared to previously generated data that identified genes that have been determined to exhibit expression within H. glycines-induced syncytia undergoing a defense response in two different G. max genotypes that are capable of undergoing two different forms of a defense reaction [11,29]. The 71 identified genes have been further analyzed for the presence of a secretion signal since earlier studies have demonstrated the importance of the G. max secretion system to defense in this pathosystem [13,14,26,30]. The analysis has resulted in the identification of 8 candidate defense genes that are induced in the MAPK-OE-all lines, are expressed within the syncytial cells undergoing the process of defense and are predicted to have secretion signals. These 8 candidate defense genes, identified from this pool, have been functionally tested through transgenic experiments and show they have a defense role. We note here that these are not the only genes that appear to have a role in the defense process, but have been examined as part of a larger ongoing series of studies to better understand the defense process in the G. max-H. glycines pathosystem.

Genetic stocks
The MAPKs that have a defense function against H. glycines have been described [11]. Overexpression of the G. max defense MAPKs (MAPKs) has been performed in the H. glycines susceptible line G. max [Williams 82/PI 518671] [11]. The growth, culture of the genetic lines, RNA isolations and confirmation of the expression of the genetic constructs have been described in McNeece et al. [11].

RNA seq analyses
The RNA sequencing (RNA seq) data used in this analysis has been obtained from our prior study and is available as BioProject ID PRJNA664992, Submission ID: SUB8182387 [28]. Single replicate generation of RNA seq data of RNA isolated from the 9 MAPK-OE lines, including MAPK2 (Glyma.06G029700), MAPK3-1 (Glyma.U021800), MAPK 3-2 (Glyma.12G07 3000), MAPK 4-1 (Glyma.07G066800), MAPK 5-3 (Glyma.08G017400), MAPK6-2 (Glyma.07G206200), MAPK 13-1 (Glyma.12G073700), MAPK16-4 (Glyma.07G255400) and MAPK20-2 (Glyma.14G028100) and controls has been performed by Omega Bioservices, 400 Pinnacle Way, Ste 425, Norcross, GA 30071 and described [11,28]. RNA sequences have been deposited and made publicly available in the MAPKDB: A MAP kinase database for signal transduction element identification) [28]. The MAPKDB database, storing essential RNA seq data relating to the G. max MAPK-OE and RNAi experiments, allows data to be retrievable based on gene identification (gene ID). The MAPK database stores descriptions of every gene that has been obtained from the RNA seq study samples, including eukaryotic orthologous groups (KOG), gene ontology (GO) assignments and protein families (PFAM). A Microsoft SQL Server 2016 Enterprise Edition has been used to design, implement and host the developed MAPKDB. The MAPKDB, as it is designed, allows users to browse, search and download the data using gene IDs or descriptions. The sample differential gene expression results are compared by the users [11].

Detection call methodology (DCM)
The relevant p-values obtained from the DCM analysis examining RNA obtained from lasermicrodissected (LM) root cells performed in Matsye et al. [26] are provided here as supplemental data. The DCM procedure has been presented, but some details are provided for analysis methodology clarity [26,29] [26,29]. The infections have led to the expected resistant outcome as proven through histocytological analyses [26,29]. LM has been used to collect pericycle cells at a time point prior to H. glycines infection (the 0 days post infection [dpi] control) [26,29]. In those same analyses, syncytia undergoing the process of resistance at 3 and 6 dpi have been collected by LM [26,29]. The cDNA probes have been produced from the 0, 3 and 6 dpi RNA samples to be used in hybridization experiments to the Affymetrix1 Soybean Gene Chip1 [26,29]. The experiments using the probe that has been generated for each soybean genotype (G. max [Peking/PI 548402] and G. max [PI 88788] ) and have been run in three independent biological replicates. This process has resulted in the production of 6 total arrays for each time point (3 arrays for G. max [Peking/PI 548402] and 3 arrays for G. max [PI 88788] ). The DCM microarray-generated gene expression data that has been obtained from the syncytium samples undergoing two different forms of a defense response have originally been analyzed using Bioconductor1. The microarray expression levels that have been measured by the probe sets have been extracted using the Robust Multichip Average (RMA) methodology as implemented in the Affymetrix1 Bioconductor1 package consisting of (1) convolution background correction, (2) quantile normalization, and (3) summarization based upon a multi-array model of the normalized and log (base 2) transformed probe set data. The presence or absence of a particular probe set's (gene's) transcript on a single array have been determined using the Bioconductor1 implementation of the standard Affymetrix1 DCM. DCM consists of four steps. The DCM consists of (1) removal of saturated probes, (2) calculation of discrimination scores, (3) p-value calculation using the Wilcoxon's rank test, and (4) making the detection call. The algorithm determines if the presence of a probe set's transcript is provably different from zero (present [P]), uncertain (marginal [M]), or not provably different from zero (absent [A]). Measurable (M) expression is a p value less than 0.05 (p < 0.05). Not measurable expression (NM) is a p value greater than or equal to 0.05 (p � 0.05). The gene expression data for the MAPK-OE-all induced gene expression data has been extracted from this data set. An important aspect of these studies is that due to how the Affymetrix 1 microarray has been constructed, some genes have had no probe set fabricated onto the array. Consequently, gene expression could not be quantified for those genes according to the analysis procedures. In these cases, the gene expression analysis is not applicable (n/a) to those genes. Genes lacking Affymetrix1 probe sets have not been considered further in the analysis presented here.

Gene Ontology analysis
Gene Ontology (GO) analyses have been performed on the protein sequences composing the list of 309 induced genes. The GO analyses have been retrieved from Phytozome. The specific tool used was PhytoMine (https://phytozome.jgi.doe.gov/phytomine/begin.do). Graphs have been generated using Excel. The analyses of the MAPK-OE-all induced and suppressed genes have been divided into three different categories including biological process, molecular function and cellular component. The same analyses have been performed for the 71 genes that are both induced in the MAPK-all-OE transgenic lines and expressed within the syncytium in the (G. max [Peking/PI 548402] and G. max [PI 88788] ) genotypes.

The cloning of candidate defense genes into the destination vectors
The pRAP destination vectors are based off of the published Gateway1 cloning vector platform [11,30,31]. The candidate defense gene amplicons that have been generated by polymerase chain reaction (PCR) have been ligated in the pENTR/D-TOPO1 entry vector (Invitrogen1) according to the manufacturer's instructions. The PCR primer sequences that have been used in the reactions are provided (S1 Table). LR Clonase1 (Invitrogen1) has been used to facilitate the transfer the candidate resistance gene amplicon to the pRAP15 overexpression and pRAP17 RNAi destination vectors according to the manufacturer's instructions. As the controls for the experiments, the un-engineered pRAP15 or pRAP17 vectors have the ccdB gene located in the position where, otherwise, the candidate resistance gene amplicon would be inserted during the LR clonase reaction. As a result, this feature makes the pRAP15-ccdB (overexpression control) and pRAP17-ccdB (RNAi control) un-engineered vectors suitable controls for any non-specific effects caused by gene overexpression or RNAi [11]. The destination vectors have been used in freeze-thaw incubation experiments to transform chemically competent Agrobacterium rhizogenes K599 (K599) with the designated gene cassette [11].

Production of transgenic plants
The vectors and procedures have been published [30,31]. Visual selection of transgenic G. max roots is performed with the enhanced green fluorescent reporter (eGFP) [11]. Roots exhibiting the eGFP reporter expression will also possess the candidate defense gene expression cassette, each with their own promoter and terminator sequences. K599 transfers the DNA cassettes located between the left and right borders of the pRAP15 and pRAP17 destination vectors into the root cell chromosomal DNA. The result is a stable transformation event in the root somatic cells even though the construct is not incorporated into the germline. Roots subsequently develop from this transgenic cell over a period of a few weeks from the base of the shoot stock. The resultant genetic mosaic has a non-transgenic shoot having a transgenic root system so in the experiments presented here, each individual transgenic root system is an independent transformant line. The transgenic plants are planted in SC10 Super cone-tainers (Stuewe and Sons, Inc.1) that are secured in RL98 trays (Stuewe and Sons, Inc.1) which are allowed to recover for two weeks prior to the start of the experiment [11]. The functionality of the genetic constructs in G. max is confirmed by qRT-PCR (Please refer to quantitative PCR [qRT-PCR] section).

The synthesis of cDNA
The method has been performed according to McNeece et al. [11]. G. max root mRNA has been isolated using the UltraClean1 Plant RNA Isolation Kit (Mo Bio Laboratories1, Inc.) according to the manufacturer's instructions [11]. The removal of genomic DNA from the mRNA is done with DNase I (Invitrogen1) according to the manufacturer's instructions. SuperScript1 First Strand Synthesis System for RT-PCR (Invitrogen1) using the oligo d(T) as the primer is used to synthesize the cDNA from mRNA according to the manufacturer's instructions.

Quantitative real-time PCR (qRT-PCR) assessment of gene expression
The method has been performed according to McNeece et al. [11]. Assessment of candidate resistance gene expression in G. max has been accomplished by qPCR. The methods have used Taqman1 6-carboxyfluorescein (6-FAM) probes and Black Hole Quencher (BHQ1) (MWG Operon; Birmingham, AL) (S1 Table) [11]. The qPCR control, used in the G. max experiments, has been designed from a ribosomal S21 protein coding gene (Glyma.15G147700) [11]. The fold change in gene expression that has been caused by the genetic engineering event has been calculated using 2 -ΔΔC T [11,32]. A Student's t-test has been used to calculate the p-values for the replicated qPCR reactions [33]. The procedures follow those presented by McNeece et al. [11].

Assaying the effect the genetic engineering event has on nematode parasitism
The infections of the transgenic plants by H. glycines has been performed according to the procedures described in McNeece et al. [11]. The female index (FI) has been calculated by procedures originally described by Golden et al. [34] and employed for transgenic research [11]. The FI is the community-accepted standard representation of the obtained data [11]. The FI = (Nx/Ns) X 100. In the procedure employed here, Nx is the pRAP-gene-transformed (experimental [candidate resistance gene]) line. Ns is the pRAP15/17-ccdB (control) line [11]. The FI has been calculated as cysts per mass of the whole root (wr) and also cysts per gram (pg) of root [11]. The whole root analysis is how the data has been presented historically [34]. The cyst per gram analysis accounts for possible altered root growth caused by the influence of the overexpression or RNAi of the candidate defense gene. Three independent biological replicates have been made for each construct with 10-20 individual transgenic plants each serving as experimental replicates within each biological replicate, statistically examined using the Mann-Whitney-Wilcoxon (MWW) Rank-Sum Test, p < 0.05 cutoff [11,35]. The MWW Rank-Sum Test is a nonparametric test of the null hypothesis not requiring the assumption of normal distributions [35].

Bioinformatics
Signal peptide prediction has been performed for the conceptually translated selected candidate defense genes. The procedure has employed SignalP-5.0 [36]. The procedure has used the default settings to identify the likelihood (p � 0.5) of having a predicted signal peptide. There are three types of peptides that can be identified, including a Sec signal peptide (Sec/SPI), a Lipoprotein signal peptide (Sec/SPII), a Tat signal peptide (Tat/SPI). Furthermore, No signal peptide at all (Other) could be determined [36].

Analysis of the transgenic constructs on root mass
Independent root growth analyses have been done in three biological replicates on one month old roots excised from the stem with fresh weight measurements made according to [11]. The percent difference in root weight has been calculated by taking the average weight of 10 candidate gene OE and 10 candidate gene RNAi cassette-expressing root masses for the target as compared to the respective 10 pRAP15-OE and pRAP17-RNAi controls, multiplied by 100. The p-values have been calculated by a Student's t-test [33].

Defense MAPK gene expression
An analysis pipeline has been developed to identify genes that (1) are induced in common between each of the MAPK-OE lines and (2) are expressed in H. glycines syncytia undergoing the process of defense and (3) have secretion signals (Fig 1). RNA seq data has been generated from RNA isolated from the G. max defense MAPK-OE lines for MAPK2, MAPK3-1, MAPK3-2, MAPK4-1, MAPK5-3, MAPK6-2, MAPK13-1, MAPK16-4 and MAPK20-2 [28]. In analyses presented here, genes having induced or suppressed relative transcript abundances as compared to the pRAP15 engineered control have been determined for each MAPK-OE line ( Table 1, S2 Table). The study has determined that there is a core set of genes whose expression is influenced by the overexpression of each of the defense MAPKs (MAPK-OE-all). The analysis has identified 309 genes with higher relative levels of transcript abundance as compared to the pRAP15-expressing control (p < 0.05) ( Table 1, S3 Table). In contrast, 815 genes have been identified to have lower relative levels of transcript abundance as compared to the pRAP15-expressing control (p < 0.05) (S4 Table).

Gene Ontology analysis for the MAPK-all induced accessions
To obtain a clearer understanding of the biological role of the identified genes, Gene Ontology (GO) analyses have been performed on these conceptually translated genes composing the list of 309 MAPK-OE-all-induced and 815 MAPK-OE-all-suppressed genes. Graphs that represent the biological process, molecular function and cellular component for the genes are provided (Fig 2). The 309 MAPK-OE-all-induced genes became the focus of the remaining analyses.

Comparative analyses of MAPK-OE-all induced accessions to syncytiumexpressed accessions
Comparative analyses have been performed that focus in on the 309 genes comprising the MAPK-OE-all induced RNA seq gene expression data in comparison to the 1,787 syncytium expressed genes [26,29]. The comparisons have resulted in the identification of a set of 167 genes (54.0%) that have Affymetrix1 microarray probe set identifiers assigned to them ( Table 1, S5 Table). These genes have been divided into 7 types (Types 1-7), based on the expression profile that has been generated by the data obtained from the laser microdissection experiment of the 0, 3 and 6 dpi time point syncytium RNA samples. Of the 167 genes that have expression in the MAPK-OE-all and also have Affymetrix1 probe set identifiers, 71 genes (42.51%) are observed to be expressed in at least one of the three selected time points. These three time points are 0 (control), 3 or 6 days post H. glycines infection (dpi) that relate to the syncytium gene expression study ( Table 1). Of these 71 genes, 45 (63.38%) do not exhibit expression prior to infection and cluster as Types 1-3 ( Table 1). The Type 1 category is defined by gene expression having not been measured at the 0 dpi time point (p � 0.05), but has been measured at the 3 and 6 dpi time points (p < 0.05). There are 16 genes (22.54%) that have been classified as having a Type 1 expression profile ( Table 1). The Type 2 category is characterized by gene expression not being measured at the 0 dpi time point (p � 0.05), but has been measured at the 6 dpi time point (p < 0.05) ( Table 1). There are 25 genes (35.21%) exhibiting Type 2 gene expression ( Table 1). The Type 3 category is characterized by gene expression not being measured at the 0 or 6 dpi time points (p � 0.05), but has been measured at the 3 dpi time point (p < 0.05) ( Table 1). Type 3 expression has been observed for 4 genes (5.63%) ( Table 1). The remaining 26 genes exhibit expression patterns that do include expression at the 0 dpi time point and span Types 4-6 (p < 0.05) ( Table 1). The Type 4 category is characterized by genes expressed at the 0 dpi time point only (p < 0.05) ( Table 1). There are 5 genes (7.04%) exhibiting Type 4 expression ( Table 1). The Type 5 category is characterized by gene expression at the 0 and 6 dpi time points (p < 0.05), but not at the 3 dpi time point (p � 0.05) ( Table 1). There are 5 genes (7.04%) exhibiting Type 5 expression ( Table 1). The Type 6 category is characterized by expression occurring at the 0, 3 and 6 dpi time points (p < 0.05) ( Table 1). There are 16 genes (22.54%) exhibiting Type 6 expression ( Table 1). The remaining 96 genes, of the 167 genes having Affymetrix1 probe sets (57.49%) that make up the Type 7 category, having not been observed to be expressed in the 0, 3 or 6 dpi time points (p � 0.05) ( Table 1, S6 Table).

Gene Ontology (GO) analyses
GO analyses have been performed on the 71 MAPK-all-OE induced genes that also have Affy-metrix1 probe set identifiers and are observed to be expressed in at least one of the three selected time points. The analyses have been divided into separate graphs that represent the biological process, molecular function and cellular component. The analyses are presented (Fig 3).

Signal peptide prediction of the MAPK-OE-all expressed genes
Analyses have demonstrated the G. max secretion system functions in its defense process toward H. glycines parasitism [13,14,26,30]. To examine this concept further, the 167 genes that have been identified as being expressed in the MAPK-all-OE RNA seq studies have been examined subsequently to determine if they have characteristics of secreted proteins. Putatively secreted proteins have been identified through the prediction of a signal peptide using the signal peptide detection program SignalP-5.0 [36]. The analyses have resulted in the  [11]. RNA has then been isolated from the MAPK-OE defense-MAPK lines as well as their accompanying pRAP15 controls for RNA seq analyses [28]. B. In an associated analysis of the gene expression occurring within the syncytium (upper right panel), two different G. max genotypes that are capable of a defense response (G. max [Peking/PI 548402] and G. max [PI 88788] ) have been analyzed. C. The RNA seq study has identified 309 genes that are induced in common between each of the MAPK-OE lines (MAPK-OE-all). D. The 1,787 syncytiumexpressed gene list has been to be used to identify whether any of those genes are regulated by MAPKs (shown in C). E. A comparison of C and D gene lists has resulted in the identification of 71 genes that are in common between the MAPK-all and syncytium expressed gene analyses. F. Prior analyses have demonstrated the G. max secretion apparatus is important in defense to H. glycines parasitism [13,14,30]. The 71 candidate defense genes have been further narrowed down for the functional studies by conceptually translating their protein products to determine if any are predicted to be secreted proteins. The analyses has employed SignalP-5.0 leading to the identification of 8 (of the 71) genes that have a predicted secretion signal [36]. These 8 genes became the candidate defense genes that have been tested in transgenic studies presented here. Please see the Materials and Methods section for analysis details.

Preparation of transgenic plants
The 5 Type 1 and 3 Type 2 putatively secreted proteins, selected from the MAPK-All-OE syncytium-expressed accessions having predicted secretion signals, became the candidates for the transgenic study. Prior to the transgenic experiments, the expression of the 8 candidate genes have been confirmed in the defense MAPK-OE transgenic lines that have been used originally to generate the RNA-seq data. The quantitative real time PCR (qRT-PCR)-based experiments confirm that the candidate defense genes are induced in their expression in each of the defense MAPK-OE transgenic lines (P < 0.05) (Fig 4).
The genes subsequently have been genetically engineered for their overexpression in the H. glycines-susceptible G. max [Williams 82/PI 518671] . These experiments have been performed to determine if an engineered defense response could be obtained. In contrast, the candidate genes have then been genetically engineered for their RNAi in the H. glycines-resistant G. max [Peking/PI 548402] . These experiments have been performed to determine if an engineered impairment of the normal defense response could be generated. The combination of engineered resistance in the G. max [Williams 82/PI 518671] genotype and engineered susceptibility in the G. max [Peking/PI 548402] genotype is indicative that the gene functions within the defense response to H. glycines parasitism [13]. The effect that the expression of the OE or RNAi transgene has on the relative transcript abundance of its target has been determined by qRT-PCR, confirming the target gene cassettes are functioning as expected as compared to the ribosomal S21 control (p < 0.05). The qRT-PCR experiments show that the candidate gene is increased in its relative expression in the overexpressing lines as compared to the pRAP15 control (p < 0.05) (Fig 5). The increase in target gene expression has a range of 3.23 fold for the secreted peroxidase (Glyma.01G171100) to 5.66 for the endomembrane protein 70 protein (Glyma.09G096700) (Fig 5). In contrast, the qRT-PCR experiments show that the candidate gene is decreased in its relative expression in the RNAi lines as compared to the pRAP17 control (p < 0.05) (Fig 5). The decrease of target gene expression has a range of -3.19 fold for the endomembrane protein 70 protein (Glyma.09G096700) to -4.99 for galactose mutarotase-like superfamily protein (Glyma.19G020700) (Fig 5).

Functional analyses of the MAPK-All, syncytium-expressed genes having secretion signals
Experiments have been conducted to determine the effect that the expression of the transgene cassettes have on H. glycines parasitism. The roots have been infected with 2,000 J2s and infection has been allowed to proceed for 30 days. At the end of the infection period, the H. glycines analysis-biological process. B. GO analysis-molecular function. C. GO analysis-cellular component. Gene Ontologies, specifically molecular function, have been retrieved from Phytozome, using the PhytoMine tool (https://phytozome.jgi. doe.gov/phytomine/begin.do). Graphs have been generated using Excel. https://doi.org/10.1371/journal.pone.0241678.g003

PLOS ONE
cysts have been collected from the whole root and associated soil samples (wr) in which the transgenic plant had been growing. Furthermore, the roots have been weighed to permit the standardization of the cyst data to the mass of the root in cyst per gram (pg) of root. The pg analysis have been done to take into consideration whether or not the overexpression or RNAi of the transgene affects root development. The enumerated female index (FI) for each transgene has revealed that the overexpression of the candidate defense genes has led to a decrease in parasitism with a low of 19.56% observed in the pg analysis of the secreted peroxidase (Glyma.01G171100) to a high of 54.87% observed in the wr analysis of hydrolase-32 (Glyma.13G349300) (p < 0.001) (Fig 6). To obtain a clearer understanding of whether these genes play a defense role, RNAi of these same 8 candidate resistance genes has been performed in the H. glycines-resistant G. max [Peking/PI 548402] to determine if engineered susceptibility could be obtained (Fig 6). The results show that H. glycines parasitism can be increased to statistically significant levels (p < 0.001). The largest increase in parasitism has been observed in the whole root analyses for the secreted peroxidase (Glyma.01G171100) (4.54 fold) while the smallest increase in parasitism has been observed for the hydrolase-32 gene (Glyma.13G349300) (2.56 fold). The combination of a statistically significant decrease in parasitism in the overexpression lines and statistically significant increase in parasitism in the RNAi lines is the criteria required for the gene have a role in defense in our analysis. The similarity in the values obtained for the cysts per whole root and cysts per gram of root indicates that there is no effect on root growth caused by the transgene.

Discussion
The involvement of MAPKs in G. max biological processes include, but are not limited to defense to different pathogens, arbuscular mycorrhizal associations during drought tolerance, wound responses, growth and other activities [37][38][39][40][41][42][43]. The annotation of the MAPK gene family presented by McNeece et al. [11] had been based off of earlier studies that led to the identification of different numbers of MAPKs. These differences in gene count arose due to the analyses procedures used [39,44,45]. McNeece et al. [11] presented a specific set of criteria used to select the chosen G. max MAPKs, based off of comparisons made to MAPKs already identified in A. thaliana and those already identified in G. max. Different numbers of G. max MAPKs, therefore, are possible due to the different set parameters [39,44,45]. The importance of the analysis became evident when MAPKs that had not previously been associated with defense had been shown to work in that manner by suppressing H. glycines parasitism when overexpressed while, in contrast, facilitating parasitism when undergoing RNAi. The 9 MAPKs that have been identified as functioning during the defense response that G. max has to H. glycines are referred to as defense MAPKs [11]. However, many details regarding these MAPKs remained to be explored. The target genes are a galactose mutarotase-like (GAL MUT, Glyma.19G020700), Pollen Ole e 1 allergen and extensin family protein (EXT, Glyma.13G178700); endomembrane protein 70 protein family (ENDO 70, Glyma.09G096700); O-Glycosyl hydrolases family 17 protein (HYDROL-17, Glyma.14G020000); glycosyl hydrolases family 32 protein (HYDROL-32, Glyma.13G349300); FASCICLIN-like arabinogalactan protein 17 precursor (AGP 17, Glyma.12G096300); peroxidase superfamily protein (PEROX, Glyma.01G171100); pathogenesis-related thaumatin superfamily protein (PR, Glyma.12G064300). A minimum cutoff is set at ± 1.5 fold. � Statistically significant. The pvalues (p < 0.001) for the replicated qPCR analyses have been calculated through a Student's t-test [33]. Please refer to Methods for analysis details. The qPCR analyses have been averaged for 3 independent replicates. MK = MAPK. https://doi.org/10.1371/journal.pone.0241678.g004
The experiments presented here build on those studies by analyzing RNA seq data that has been obtained from the MAPK-OE lines for each of the 9 G. max defense MAPKs [28]. While Alshehri et al. [28] had developed a database to house and analyze the RNA seq data, the actual data had not been studied in a manner that would identify candidate defense genes that may be involved in impairing H. glycines parasitism. That analysis has been presented here. The analysis presented here has been able to put the MAPK RNA seq data into the perspective of previously identified gene expression events occurring specifically within the syncytium undergoing the process of defense in two different G. max genotypes that are capable of a defense response (G. max  and G. max [PI 88788] ) [26,29]. By comparing the MAPK-all-OE RNA seq and syncytium expression data sets, it has been possible in the analysis presented here to obtain a perspective regarding the defense process that compliments the earlier work.

RNA seq analyses identify a core gene set expressed in common between the defense MAPKs
The RNA seq analyses presented here have identified a core set of 309 genes that are increased in their relative transcript abundance while 815 are decreased among the MAPK-all-OE lines as compared to their control (p < 0.05). These results are consistent with gene expression studies performed in A. thaliana that have shown changes in gene expression occurring at the genomic level during a defense response [49]. Among the genes that are increased in their relative transcript abundance are a number of receptors having defense functions as resistance (R) genes. These genes will be described briefly as some of them relate to previously identified genes that function during the defense response that G. max has to H. glycines.
A number of LRR containing genes have been identified (Glyma.10G263200; Glyma.12G238600; Glyma.13G266300; Glyma.08G083300 [FLAGELLIN SENSING2 {FLS2}]; Glyma.10G195700 [DAMP PEPTIDE 1 RECEPTOR {PEPR1}]). The A. thaliana FLS2 protein has been shown to interact with a bacterial flagellin protein epitope to activate BRASSINOS-TEROID INSENSITIVE1 ASSOCIATED KINASE1 (BAK1) [22,25]. In turn, BAK1 phosphorylates BIK1 [28,63,64]. BIK1 can also be autophosphorylated, indicating that it functions differently than BAK1 [65]. Pant et al. [13] has demonstrated the function of the G. max BIK1 in defense in the G. max-H. glycines pathosystem. Furthermore, the G. max BIK1 results in an increase in GmMAPK3 transcription [11]. This class of genes also includes the A. thaliana AtPEPR1 which is shared between BIK1 and BAK1 [25]. Out of all of the receptors described here, only NDR1 has been shown to be expressed within the syncytium and induced by MAPK expression. The remaining receptors are only induced in their expression in the MAPK-OEall.

The identification of a core MAPK-induced defense transcriptome
A relatively low number of genes have been identified in the RNA seq screen presented here as compared to the G. max genome having 88,647 transcripts and 56,044 protein-coding loci. These results indicate that the transcriptional screen had been successful in terms of yielding a set of genes that would be manageable in functional studies such as those presented here. However, the genes then had been analyzed further to fulfill a narrower scope sought for the functional transgenic screens.
The 309 induced genes that have been identified in the MAPK-OE-all screen is relatively low and manageable for transgenic testing. However, we sought ways to further narrow down the gene list while also learning more about the properties of the genes that compose that group. We have been able to compare the 309 MAPK-OE-all induced genes to gene expression data generated from specific root cell types found in G. max [Peking/PI 548402] and G. max [PI 88788] . Prior analyses of the syncytium undergoing the defense process has been done [26,29]. The experiments resulted in the identification of a number of genes that are expressed specifically within the syncytium during the defense processes [26]. It had been expected here that a number of the 309 MAPK-OE-all induced genes would not be expressed at the site of the defense response. Such an outcome would further narrow down the quantity of genes. The result of the analyses on the 309 MAPK-OE-all induced genes left 167 genes that we were able to obtain gene expression data for. An analysis then had been performed to determine whether Affymetrix1 probe sets for these genes measured gene expression within the pericycle cells prior to parasitism (control) and parasitized root cells at 3 and 6 dpi [26]. In an examination of these data, we identified 7 different types of gene expression patterns. The characteristics of the gene activity have been referred to as expression Types 1-7. The main point taken from this analysis is that there had been 71 genes whereby detectable gene expression could be made for at least 1 of the 3 G. max root cell types (pericycle, 3 or 6 dpi syncytium undergoing the defense response) cell types. We observed that 45 of these genes are not expressed within the 0 dpi control, but are subsequently expressed in some combination within the 3 and 6 dpi time point samples (Types 1-3). We then observed that 26 genes incorporate expression at the 0 dpi time point (Types 4-6). The remaining 97 genes did not have measurable expression according to our criteria in any of the cells at the 0, 3 or 6 dpi time points, but have been shown to be induced in the MAPK-all-OE induced RNA seq analyses.

MAPK-induced genes expressed at 3 and 6 dpi syncytium-signal peptide predicted function in defense (Type 1)
Our analysis has identified 16 genes that are not expressed at 0 dpi, but then are expressed at the 3 and 6 dpi time points. Five of these genes are predicted to have secretion signals and are described here. The G. max galactose mutarotase-like superfamily protein (Glyma.19G020700) has not been studied in plants. Therefore, the functional work obtained in the study presented here is new to plants. However, the Saccharomyces cerevisiae galactose mutarotase protein functions in maintaining the equilibrium between the α-and β-anomers of galactose [66]. Ultimately, d-galactose is metabolized to d-glucose 1-phosphate whereby it is converted to dglucose 6-phosphate which can enter glycolysis [66]. Glucose 1-phosphate can also be bound to uridine triphosphate through the activities of UDP glucose pyrophosphorylase to produce UDP glucose. In A. thaliana UDP glucose phosphorylase functions in regulating programmed cell death and is essential [67,68]. The G. max pollen Ole e 1 allergen and extensin family protein (arabinogalactan (AGP) (Glyma.13G178700) has been shown in A. thaliana to perform roles in root epidermal cell elongation [69]. Furthermore, the A. thaliana AGP mutation root epidermal bulger 1 (reb-1) results in altered cortical microtubules, indicating a cross talk between cell wall modifying proteins and the cytoskeleton [70]. The G. max endomembrane protein 70 protein (EMP70) family (Glyma.09G096700) is a phylogenetically conserved protein. Experiments that have isolated A. thaliana Golgi proteins have revealed that EMP70 represents 25% of the quantified Golgi proteome [71]. Furthermore, EMP70 represents 50% of the top 10 most abundant Golgi proteins, but has been stated to have limited functional characterization [71]. Pant et al. [13] have already demonstrated the functionality of a xyloglucan endrotransglycosylase hydrolase (XTH) during the defense response G. max has toward H. glycines parasitism, reducing infection by 90%. XTH is processed through the Golgi apparatus, revealing the importance of the Golgi apparatus to defense in the G. max-H. glycines pathosystem [72]. The O-glycosyl hydrolases family 17 protein (Glyma.14G020000) and glycosyl hydrolases family 32 protein (Glyma.13G349300) relate to recent experiments performed in Nicotiana benthamiana [73]. The A. thaliana O-glycosyl hydrolases family 17 has been shown to promote hydrolytic elicitor release and consequently acting in immunity against the pathogenic bacterium Pseudomonas syringae [73]. The analyses presented here have provided further support for the hypothesis that the secretion system of G. max performs an important role that leads to its ability to overcome H. glycines parasitism (Fig 7) [26]. The experiments have also identified genes having other types of expression (Types 2-7) that may represent a pool of candidate genes with important functions in the defense response that G. max has toward H. glycines.

Downregulated genes
A number of downregulated genes have been identified during the course of the analyses. The identification of downregulated genes (as well as upregulated genes) is a common finding among any genomics-based analyses and it can represent how an event influences a number processes occurring within an organ. Early genomics studies have identified a shift from housekeeping to defense processes occurring within plant organs undergoing a defense process [49]. Consequently, identifying suppressed genes here is not surprising. The MAPK-based study presented here was only designed to identify genes that are induced or suppressed by certain MAPKs from RNA isolated from uninfected roots. The types of genes that have been identified may or may not have defense functions when examining the specific cells that are undergoing infection or even specific races/pathotypes of H. glycines. There also maybe specific defense functions for certain members of gene families like what has been shown in G. max for its MAPKs [11]. This would represent a shift from housekeeping functions to defense functions within these gene families. Further study is required to examine why some of these genes appear to be suppressed in common between the different defense MAPKs and is a rich area for future analysis.
Supporting information S1 Fig. Prediction of signal peptides. SignalP 5.0 has been employed, using default settings to identify the likelihood of having a predicted signal peptide. There are three types of peptides that can be identified, including a Sec signal peptide (Sec/SPI), a Lipoprotein signal peptide (Sec/SPII), a Tat signal peptide (Tat/SPI). Furthermore, No signal peptide at all (Other) could , when engineered to overexpress one of the 8 candidate defense genes, results in a resistant reaction. The reaction is not exactly like a naturally occurring resistant reaction like that occurring in G. max [Peking/PI 548402] (like shown in C) as indicated by the presence of more cysts (n = 3) than the naturally occurring resistant reaction (n = 1, in C). The level of engineered resistance would depend on the potency and/or timing of the expression of the gene. C. 8 Candidate defense genes have been determined to be more highly expressed in the syncytium undergoing a defense reaction to H. glycines parasitism in a naturally H. glycines-resistant genotype (i.e. G. max [Peking/PI 548402] ). D. The H. glycines-resistant genotype (i.e. G. max [Peking/PI 548402] ), when engineered to undergo RNAi of one of the candidate defense genes, results in an engineered susceptible reaction. The reaction is not exactly like a naturally occurring susceptible reaction as found in G. max [Williams 82/PI 518671] (as shown in A) as indicated by fewer cysts (n = 7) than the naturally occurring susceptible reaction (n = 10, in A). The level of engineered susceptibility would depend on the importance of the gene to the defense reaction, its potency and/or the timing of its expression. The relative numbers of cysts are arbitrary for descriptive purposes.