Gene regulatory network inference in soybean upon infection by Phytophthora sojae

doi:10.1371/journal.pone.0287590

Fig 1.

Schematic overview of the study design.

(a) Soybean plants harboring Rps1k were inoculated with a Race 1 P. sojae isolate, Race 25 isolate, or sterile media. Inoculated hypocotyls were used for RNA-seq. Capture-seq was performed subsequently to validate the RNA-seq data. (b) Overrepresented TF families were identified from the RNA-seq analysis. DAP-seq data was generated/obtained for the families most represented by total abundance and percentage of genome-wide proportion. (c) DL models were trained using DAP-seq binding site data. The capacity of some models to generalize across a given TF family was performed intra- and interspecifically. For several TF families of interest, soybean- or Arabidopsis-based DNNs were trained and used to predict TFBS. (d) DNN predictions were overlapped with FIMO motif scans, and the highly confident targets were used to construct a GRN.

More »

Expand

Fig 2.

Pathogenicity testing and transcriptome analysis.

(a) Disease development in Race 25- (top) and Race 1-treated (bottom) hypocotyls at seven days post-infection. (b) Venn diagram of DEGs between different treatments. (c) TF representation among DEGs from RNA-seq. WRKY was the most represented TF family by total abundance and RAV by the percentage of genome-wide proportion. (d) K-means clustering of DEGs. DEGs were assigned to nine co-expression clusters. Of these, seven displayed increased expression (log₂FC [FC] >0) in infected vs Mock treatments, while two demonstrated decreased expression (FC <0). (e) Functional enrichment and TF representation for gene co-expression clusters. (left panel) Top five GO categories by adjusted p-value ( ≤0.05; data available in S7 Data). (middle panel) Top five KEGG terms by adjusted p-value ( ≤0.05). (right panel) top 3 TF families (abundance) for each cluster.

More »

Expand

Fig 3.

DAP-seq identification of GmWRKY30 and GmRAV TFBS.

(a) Distribution of DAP-seq peaks across genomic features. (b) Distance of peaks from the TSS.

More »

Expand

GRN inference at 24 hpi.

(a) (left) log₂FC (FC) of DEGs across interaction types, (middle) WRKY and RAV binding site representation in the DEG set derived from DAP-seq, and (right) binding site representation for each TF family in the DEG set derived from CRNN + FIMO prediction. The bar plot shows the total number of target genes for each family, as well as the number of TF-encoding target genes (blue). (b) Hairball of the global GRN. Nodes and edges represent TFs and target genes, respectively. Node size corresponds to outdegree. (c) Scatterplot of the top co-occurring TF pairs by cosine association score identified with TF-COMB. The datapoint color reflects the total number of shared targets for a given TF pair. (d) Prioritization of nodes. Nodes that were statistically enriched by Simple Enrichment Analysis and were represented in the transcriptome analysis (n = 118) were prioritized by outdegree, cumulative indegree, cumulative cosine, and mean |log₂FC| (Mean |FC|). Blue polygons represent the upper quartile for each parameter. Thirteen genes/14 TFs were in the upper quarter for all four parameters. (e) Hairball of the hub nodes. Node size corresponds to outdegree.

More »

Expand