Figures
Abstract
Bovine respiratory disease (BRD) is a multifactorial disease of dairy and beef cattle that involves complex interactions with the host immune system. In the current study, a comprehensive meta-analysis was performed using a P-value combination approach. In the next step, the identified meta-genes were subjected to systems biology analysis using the weighted gene co-expression network analysis (WGCNA) method. Subsequently, the most functionally important modules and genes were validated using machine learning algorithms. Finally, the critical regulatory network associated with BRD was constructed. A total of 1,908 common meta-genes were identified through the combined analysis of differentially expressed genes (DEGs) using the Fisher and Invorm approaches. Co-expression network analysis confirmed six functional modules, among which the connectivity patterns of the blue, brown, green, and yellow modules were significantly altered in BRD-affected cattle compared with healthy controls. Functional enrichment analysis of the significant modules revealed that the ‘Salmonella infection,’ ‘NOD-like receptor signaling pathway,’ ‘Necroptosis,’ ‘Toll-like receptor signaling pathway,’ ‘TNF signaling pathway,’ ‘IL-17 signaling pathway,’ ‘Apoptosis,’ and ‘Influenza A’ pathways were the most significantly associated with BRD. The constructed regulatory network identified GABPA, TCF4, ELK1, NR2C2, and ARNT as key transcription factors (TFs), each playing a central role in regulating immune and inflammatory pathways implicated in BRD. Finally, the constructed model revealed that differential expression of the CFB gene is significantly associated with susceptibility to BRD. In cattle, CFB expression correlates with clinical signs of respiratory disease, supporting its potential as a biomarker. Moreover, the involvement of CFB in modulating pro-inflammatory cytokines (e.g., TNF) and its integration with other immune-related pathways (e.g., NF-κB signaling) further highlight its relevance as a biomarker. Overall, this integrative approach enhances our understanding of the molecular mechanisms underlying BRD and provides a foundation for developing diagnostic, therapeutic, and genetic selection strategies to improve cattle health and disease resistance.
Citation: Ghahramani N, Hashemi A, Panahi B (2025) Weighted gene co-expression network analysis identifies functional modules related to bovine respiratory disease. PLoS One 20(10): e0334688. https://doi.org/10.1371/journal.pone.0334688
Editor: Angel Abuelo, Michigan State University, UNITED STATES OF AMERICA
Received: February 15, 2025; Accepted: September 30, 2025; Published: October 27, 2025
Copyright: © 2025 Ghahramani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: data are available in methods and material section.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Bovine respiratory disease (BRD) is a multifactorial disease involving intricate host-immune interactions influenced by environmental variables and pathogens [1]. The innate immune system represents the first line of defense against BRD. Epithelial cells and immune sentinel cells help prevent infection by secreting proinflammatory cytokines. Neutrophils play an essential role in the pathogenesis of BRD by promoting inflammation and contributing to lung tissue damage [2]. Environmental factors, including weaning, transportation, overcrowding, and inadequate ventilation, negatively impact the host animal’s immune and non-immunological defense mechanisms [3]. Non-immunological defense in BRD involves physical and chemical barriers, which act as the first line of defense before the adaptive immune system is activated. These early defense systems play a crucial role in limiting pathogen entry and controlling infection at the onset of disease. BRD is commonly associated with bacterial and viral pathogens; The involvement of bacterial species, including Histophilus somni, Mannheimia haemolytica, and Mycoplasma bovis, as well as viruses, such as bovine respiratory syncytial virus, bovine viral diarrhea virus, and bovine herpesvirus-1, has been extensively investigated [4,5]. These bacterial and viral pathogens represent the most prevalent and clinically significant agents implicated in the development and progression of BRD. Detecting key genes involved in BRD enhances our understanding of the biological pathways that contribute to disease resistance, particularly those related to immune response and inflammation [6,7]. Vaccination programs and treatment of subclinical animals are the primary approaches for preventing and controlling BRD. Additionally, identifying genes associated with disease susceptibility can support the development of breeding programs aimed at improving cattle to BRD [7].
RNA sequencing (RNA-Seq) is a powerful and comprehensive technique used to investigate gene expression profiles, functional mechanisms and molecular processes within distinct biological pathways [8,9]. RNA-Seq currently represents the most precise and reliable method for estimating gene expression levels [10]. Several RNA-Seq studies have successfully identified key genes and host molecular mechanisms involved in the pathogenesis of BRD [1,11,12]. These studies have provided valuable insights into differential gene expression (DEGs) patterns, immune response pathways, and regulatory networks that contribute to disease susceptibility and progression [1]. The genes STAT3, IKBKG, IRAK1, NOD2, TLR2, and IKBKB have been identified as key contributors to the immune response in bovine respiratory disease [13].
Individual RNA-Seq studies often face limitations such as small sample sizes, platform variability, and limited statistical power [14]. To address these challenges, meta-analysis provides a robust and standardized approach for integrating data from multiple studies, thereby increasing statistical power and improving the reliability of gene expression findings [15]. This meta-analysis approach has been used to identify gene expression profiles and evaluate the relative effectiveness of antibiotics in controlling BRD in beef cattle [16].
Weighted Gene Co-Expression Network Analysis (WGCNA) is a systems biology approach used to identify modules of highly correlated genes [17]. A co-expression network was constructed using WGCNA to identify significant modules and potential candidate biomarkers in BRD and healthy groups [18].
Functional enrichment analysis identifies specific biological functions that are overrepresented in a group of DEGs [19]. The associated functional pathways are obtained from online bioinformatics databases, and the relative abundance of genes relevant to particular pathways is statistically calculated [20].
A gene regulatory network consists of a set of genes that interact to control specific cellular functions. Understanding the relationships between target genes and transcription factor (TF)-target interactions provides valuable insights into the organization of these networks and their role in regulating gene expression and cellular processes [21,22].
Machine learning algorithms have become essential tools in genetics due to the their ability to learn patterns from labeled data and make predictions or classifications [23]. Reports indicate that integration co-expression network analysis with gene prioritization using machine learning (ML) frameworks is an effective approach for finding new protein functions, cellular and tumor expression profiles, and potential disease-related biomarkers [24,25]. Such approaches have been successfully applied to uncover complex functional regulation and predict expression signatures, providing critical molecular insights into diseases [26]. A subset of RBD-related genes, including PLDA, PHLDA2, VNSC, and PAM, was predicted using a decision tree (DT) algorithm [1].
This study aims to construct a gene correlation network based on gene expression data from infected and healthy groups. Furthermore, to validate previous findings obtained through DEGs analysis and to potentially identify novel genes and molecular mechanisms associated with BRD, ML approaches were employed, and candidate biomarkers were proposed for BRD prediction.
Materials and methods
Data collection
The RNA-Seq data associated with BRD were obtained from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/gds/). Studies were selected based on specific criteria, including recent research on BRD, availability of accessible count matrices, clearly defined case and control groups, sufficient sample sizes for robust statistical comparisons, and consistent sequencing platforms. Only studies that utilized whole blood as the biological sample were included, providing consistency in tissue-specific gene expression profiles and minimizing variability introduced by different tissue types. Additionally, the availability of raw count matrices in publicly accessible repositories, such as GEO, was a crucial requirement, allowing for independent verification, re-analysis, and integration with other datasets as part of meta-analytical approaches. Five RNA-Seq studies related to BRD were systematically analyzed. The first dataset (GSE150706) profiled the blood transcriptomes of 24 beef steers (n = 72) at three critical stages: Entry (on arrival at the feedlot), Pulled (when disease is detected), and Close-out (recovered, healthy cattle at shipping to slaughter). The goal was to uncover key biological functions, regulatory factors, and gene markers for early diagnosis. Disease identification in the Pulled group was based on a combination of clinical scoring, elevated rectal temperature, and veterinary diagnosis. For analysis, blood transcriptomic data from the Entry and Pulled phases, each comprising 24 samples, were utilized. The second dataset (GSE152959) examined the whole blood transcriptomics of Holstein-Friesian calves infected with Bovine Respiratory Syncytial Virus (BRSV), including 6 control calves and 12 infected calves. The third dataset (GSE162156) comprised whole blood transcriptomic profiling of heifers without BRD (n = 18) and with BRD (n = 25). The fourth dataset (GSE199108) included young Holstein-Friesian calves infected with Bovine Herpesvirus 1 (BoHV-1) (n = 12) and uninfected calves (n = 6). The fifth dataset (GSE217317) examined the relationships between the transcriptome, genome, and BRD phenotype of feedlot crossbred cattle using multi-omics analyses, with a total of 143 samples collected from 80 cattle diagnosed with BRD and 63 pen-matched controls at a single time point. The studies were selected to ensure the reliability of results. Only studies published between 2020 and 2023 were included, representing recent research. To ensure statistical robustness, only studies meeting a predefined minimum sample size were considered. Articles were initially identified using the keywords: “whole blood RNA-seq,” “Bos taurus,” “BRD,” and “gene expression.” Table 1 summarizes the accession numbers, platforms, sample sizes, layout, and references for the mRNA-Seq datasets.
RNA-Seq data processing
Differentially expressed genes (DEGs) between infected and control samples were identified using the DESeq2 statistical tool (v1.28.1) in the R package with default settings [31]. DESeq2 utilizes negative binomial generalized linear models to test for DEGs and calculates the dispersion for each gene based on its variance and expression level [32]. To employ the DESeq2 package, the Ensembl IDs in each count matrix were converted to GeneIDs. The Wald test was applied using DESeq2 to assess the statistical significance of DEGs between conditions. A fold change of ≥|2| and a corrected P-value of ≤ 0.05 were used as cutoff points for differential expression [33]. To minimize the impact of batch effects arising from variations in experimental protocols, sequencing platforms, alignment tools, and reference genome annotations, a normalization approach implemented in DESeq2 was applied. This method effectively reduces unwanted sources of variation commonly encountered in high-throughput sequencing experiments, thereby enhancing the accuracy and comparability of gene expression measurements.
Meta-analysis of RNA-Seq datasets
Meta-analysis has been widely applied in genetic research, particularly for identifying DEGs under two conditions (healthy and infected) in transcriptome analysis [34]. This method has been especially successful in identifying disease-related genes [35]. Meta-analysis encompasses a set of techniques that allow the quantitative combination of data from multiple studies [36]. To minimizes the impact of batch effects arising from variations in experimental protocols, sequencing platforms, alignment tools, and reference genome annotations, we utilized P-value combination approaches. Specifically, the fishcomb and invnorm algorithms were employed to merge P-values, as implemented in the metaRNA-Seq Bioconductor package (v1.0.5) [37]. Both methods were applied to reduce false positive results, and DEGs with a false discovery rate (FDR) < 0.05, were considered significant common meta-genes identified by both statistical methods. Fisher’s method combines the P-values from each experiment into a single test statistic defined as follows:
The test statistic χ² follows a χ2 distribution under the null hypothesis, where indicates the raw P-value obtained from genes in the study.
Weighted gene co-expression network analysis
The WGCNA Bioconductor R package (v3.5.1) was used to detect correlation patterns among genes and identify significant modules across RNA-Seq datasets. Network construction; module identification; module and gene selection; calculation of network topological features; visualization was all performed using the WGCNA approach. The expression values of meta-genes were normalized using the variance-stabilizing transformation (vst) function [38]. To increase the reliability of the constructed co expression network, an outlier detection step to remove unusual samples that could bias the results. After removing outliers, the number of applied samples exceeded 15, further enhancing the robustness of the study. Additionally, a signed method was applied for network construction. Pearson correlation was used for both outlier detection and meta-gene identification due to its computational efficiency [39]. Meta-genes were identified based on pairwise Pearson correlation coefficients. The relatively large sample size in this study also contributed to the robustness and reliability of the findings. Subsequently, the similarity matrix was converted into an adjacency matrix. The corresponding dissimilarity matrix (1 − TOM) and topological overlap matrix (TOM) were derived from adjacency matrix [17]. Finally, a dynamic hybrid tree-cutting technique was employed to identify modules, with average linkage hierarchical clustering performed using the topological overlap-based dissimilarity matrix as input [40].
Functional enrichment analysis
To investigate the functional significance of the identified modules, enrichment analysis was performed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) via the STRING and DAVID databases [41]. This analysis identifies biological-processes (BPs), molecular-functions (MFs), and cellular-components (CCs) that are significantly overrepresented. Detecting these enriched biological and molecular pathways provides valuable insights into the underlying biological mechanisms of BRD.
Inferring gene regulatory networks
To infer the regulatory network of significant modules within the constructed co-expression network, Cytoscape software was used to map (TF)-target interactions. By constructing gene regulatory networks, key regulatory genes associated with specific diseases, such as BRD, can be identified. The iRegulon plugin in Cytoscape was employed to identify these regulators and their corresponding TF-target interactions. Subsequently, regulatory relationships between the top target genes and the identified TFs were constructed and visualized, providing a comprehensive view of how these molecular components interact to influence disease-related pathways and processes.
Supervised machine-learning models
Supervised machine learning involves training a model on a dataset that includes input features (genes) and known outputs (healthy or disease status). The machine learning process can be divided into several key stages: data pre-processing, data splitting, model selection, model training, and model evaluation [42]. We employed machine learning algorithms to identify and validate the most significant meta-genes associated with BRD, ensuring precise gene prioritization with optimal accuracy. Feature selection algorithms based on weighting methods were applied to prioritize key genes related to BRD. Hub genes from significant modules were used to select important features using 10 distinct weighting algorithms, including information gain, information gain ratio, χ2, deviation, rule, support vector machine, Gini index, uncertainty, relief, and principal component analysis (PCA) [43]. Next, the selected features were used to optimize predictive modeling with the Decision Trees (DTs) algorithm. To evaluate the model’s accuracy, a 10-fold cross-validation approach was applied. In this method, the data were partitioned into 10 equal subsets; in each iteration, 9 subsets were used for training and 1 subset for testing, ensuring thorough assessment of the model’s generalizability and predictive performance. Ultimately, the model with the highest accuracy was proposed as the best predictor for identifying BRD-associated genes.
Results
Meta-analysis
The overall workflow of our systems biology approach is summarized in Fig 1.
The five RNA-Seq datasets comprise a total of 226 independent samples, including 132 BRD-infected samples and 94 control samples from dairy and beef cattle experiments.
As shown in Fig 2, the number of DEGs varies across datasets.
The meta-analysis results (see Fig 3) indicate that 1908 common meta-genes were identified.
Results of the meta-analysis showing 1908 common meta-genes identified by both methods for subsequent gene correlation analysis.
Weighted gene co-expression network construction
To obtain a reliable network and identify significant modules with high correlation, outlier samples were removed from the initial datasets. Preliminary analysis revealed that samples with a standard connectivity score (Z.k) lower than −2.5 were excluded, and the remaining samples were used for weighted co-expression network construction. After removing outliers, the high-quality data were subjected to WGCNA to explore underlying gene expression patterns. The soft-thresholding power (β) was selected by systematically evaluating the scale-free topology fit index (R²) across a range of candidate powers. The optimal β was defined as the smallest power at which the network exhibited approximate scale-free properties (R² > 0.8). This selection ensures that the constructed network reflects the expected biological architecture of gene co-expression, thereby enhancing the accuracy and interpretability of subsequent module detection and functional analysis. Identification of six co-expressed gene modules with an average size of 318 genes is shown in Fig 4A. The size distribution of the co-expressed gene modules is summarized in Fig 4B. The heat map of the TOM illustrating gene interconnectedness within the modules is shown in Fig 4C. Module eigengenes representing expression patterns within each module are shown in Fig 4D.
B. Size distribution of the co-expressed gene modules, highlighting the turquoise module as the largest (1169 genes) and the gray module as the smallest (14 genes). Genes unrelated to any module were assigned to the gray module. The brown (n = 202 genes), yellow (n = 191 genes), blue (n = 207 genes), and green (n = 125 genes) modules were identified as critical functional modules associated with BRD through WGCNA analysis. C. Heat map of the TOM showing the degree of interconnectedness among genes within the modules identified by the dynamic tree cutting algorithm. Yellow and progressively red colors indicate low and high TOM values, respectively. D. Module eigengenes representing the gene expression patterns within each module, derived as the first principal component of each module’s expression data matrix.
Functional enrichment analysis of meta-genes
The results of functional enrichment analysis of significant modules revealed 525, 58, and 57 GO terms for BPs, MFs, and CCs, respectively. The most critical terms enriched in the BPs included “Response to stress,” Regulation of response to stimulus,” Regulation of metabolic process,” “Positive regulation of biological process,” and “Regulation of immune response,”. In the CCs category, significantly enriched terms included “Cytoplasm,” “Intracellular organelle,” “Nucleus,” “Nucleoplasm,” and “Cytoplasmic vesicle”. The most significantly enriched BPs and CCs identified from the analyzed gene modules are shown in Fig 5.
These terms highlight the main biological roles and cellular localizations involved in BRD.
In the MFs category, significant enrichment of binding-related terms was observed, including ‘binding,’ ‘protein binding,’ ‘enzyme binding,’ ‘ATP binding,’ and ‘small molecule binding.’ This enrichment indicates substantial involvement of molecular interactions that are critical for various biochemical and regulatory processes. Additional details related to MFs can be found in Table 2, which offers further insights and relevant data supporting the analysis.
The most over-represented KEGG pathways identified within the significant modules included the ‘NOD-like receptor signaling pathway,’ ‘TNF signaling pathway,’ ‘IL-17 signaling pathway,’ ‘NF-kappa B signaling pathway,’ and ‘T cell receptor signaling pathway.’ These pathways are known to play critical roles in immune response and inflammation. Comprehensive details regarding these significant modules, including associated genes and enrichment statistics, are provided in Table 3 to support further interpretation and analysis.
TFs- hub genes regulatory network
There are 537, 154, 288 and 506 target genes in the blue, brown, green and yellow modules, respectively. The transcriptional regulatory network of the main target genes and TFs was established using Cytoscape. Key target genes were identified based on the intra-modular connectivity criterion. The TFs GABPA, TCF4, ELK1, NR2C2 and ARNT were identified as important regulators, controlling 24, 14, 20, and 22 hub genes in the blue, brown, green, and yellow modules, respectively. These TFs specifically regulate the hub genes within the significant modules associated with BRD. A large number of TFs can regulate hub genes, and their expression may reflect the complexity of mechanisms that lead to BRD. TCF4 was found to directly interact with the STK40, PRKAB1, MAP3K11, and OGFR genes in the brown, blue, yellow and green modules, respectively. Key TFs interacting with target genes within the constructed regulatory networks are shown in Fig 6.
Validation hub genes in co-expressed modules
Initially, normalized gene expression values were assigned to the data matrix, which served as the foundational input for subsequent analyses. The DT method was employed to confirm the identified hub genes using four distinct criteria: information gain, accuracy, Gini index, and information gain ratio. According to the data, the DT using the gain ratio criteria achieved the highest accuracy (70%). Based on the expression levels of meta-genes, the DT model confirmed the functional significance of the top-ranked genes in categorizing respiratory diseases in cattle. The DT highlighting the CFB gene as a potential biomarker for BRD is shown in Fig 7. Samples were classified as BRD when the expression levels of CFB and STAT2 were greater than 131.500 and 9161, respectively. Alternatively, samples were also identified as BRD when CFB expression was > 131.5, STAT2 ≤ 9161, NLRC5 ≤ 11922, IL15RA ≤ 800, and ALPK1 > 635. These findings indicate that CFB expression consistently plays a central role in BRD classification, depending on the expression context of other genes such as STAT2, NLRC5, IL15RA, and ALPK1.
Discussion
Complex host immune interactions in dairy and beef cattle during BRD are influenced by pathogenic agents and environmental factors [44]. Additionally, the inclusion of animals from naturally infected and experimentally challenged studies may introduce variability in gene expression profiles. The complex regulation of these genes may correspond to the enrichment of specific biological pathways. However, due to limited sample sizes in some individual studies, the statistical power to detect reliable gene-level changes was insufficient. To overcome this limitation and enhance the reliability of the results, a meta-analysis approach was employed. By combining transcriptome datasets across multiple studies, this method increased the overall analytical power and provided new insights into the relationships and expression patterns of key regulatory genes associated with BRD. Furthermore, using the Fisher and Invorm methods, our analysis enabled more meaningful and robust identification of meta-genes. These outcomes were subsequently used in system biology analysis to characterize patterns of gene correlation across samples. Modules of highly connected genes were then subjected to pathway enrichment analysis. Finally, gene regulatory networks between TFs and target genes were inferred using Cytoscape to elucidate effective genes involved in the development of BRD. Supervised ML algorithms were selected for this study, as the primary objective was to predict and validate genes based on known biological patterns and labeled data. Overall, the meta-analysis approach detected 1908 common meta-genes using the Fisher and Invorm methods. System biology analysis revealed that the topological characteristics of the BRD gene co-expression network differed, resulting in the formation of four co-expressed modules: blue, green, yellow and brown. These modules were selected for further investigation.
The analysis of co-expression modules revealed key genes potentially involved in BRD. In the blue module, the most significant genes included PRCC, CHMP7, ABCF3, BAG6, and ADCK2; the brown module included TBC1D14, ELL, STK40, NOD2, and CYRIA; the yellow module featured MAP3K11, LRP10, LRPAP1, SPI1, and GPAT4; and the green module contained OGFR, GET3, ADRM1, MUL1, and WASHC1. The PRCC gene has been implicated in signaling cascades that may contribute to tumorigenesis in lung cancer [45,46]. The CHMP7 gene has been associated with immunodeficiency, translation regulation [47], membrane deformation, cell division, metabolism, and development; notably, its low expression may impair these biological processes [48]. BAG6, a novel regulatory protein, was found in porcine respiratory syndrome virus [49]. TBC1D14, detected in our study, was introduced as a novel metastasis biomarker by downregulating macrophage-erythroblast interactions [50]. The ELL gene connects the regulation of transcription elongation to cell growth [51]. STK40 was revealed as a key gene whose deletion alters the expression of genes important for lung development [52]. Disruption of NOD2 gene has been associated with impaired resistance to inflammatory diseases [53], while its interaction with BRSV suggests a potential role in the innate immune response to viral infection [54]. MAP3K11 was described as a target gene in lung cancer [55], contributing to the proliferation of activated epithelial cells [56]. SPI1 and PSTPIP1 were discovered as novel early prognostic biomarkers relevant to innate immune response in lung adenocarcinoma [57,58]. Further, OGFR [59] and GET3 play critical roles in regulating immunological functions, influencing both innate and adaptive immune responses. The immune system plays a key role in defending the body against pathogens, and maintaining cellular homeostasis. Regulating immune functions supports controlled inflammation, enhances antioxidant defense, facilitates cell signaling and repair, and promotes the removal of damaged cells [60]. According to our findings, ADRM1 plays a role in the migration, survival, and proliferation of cancer cells [61]. MUL1 was identified as a novel regulator of antiviral response, limiting inflammation, and downregulation mitochondrial respiration [62].
Functional enrichment analysis of the significant modules indicated that these modules were enriched in BPs such as “Regulation of defense response,” “Inflammatory response,” “Regulation of immune response,” “positive regulation of biological processes” [12], and “Regulation of signal transduction” [63] in BRD. Additionally, consistent with our results, several important MFs, including “signal transduction”, “Protein binding,” “enzymes binding,” [64,65] “ion binding,” and “catalytic activity” [40] have also been detected in bacterial respiratory infections in calves during BRD [66]. Analysis of KEGG pathways in the significant modules revealed that the co-expressed genes were highly enriched in pathways such as “Salmonella infection”, “NOD-like receptor signaling pathway”, “Toll-like receptor signaling pathway”, “Apoptosis”, “IL-17 signaling pathway”, “Tumor Necrosis Factor” and “NF-kappa B signaling pathway”. “Salmonella infection” has been shown to affect cattle, horses, and pigs causing reproductive losses and clinical disease such as BRD. Several previous studies indicated that the “NOD-like receptor signaling pathway,” “Toll-like receptor signaling pathway,” and “Apoptosis” are enriched in diverse respiratory bacterial and viral pathogens [67]. Our findings also revealed that the key genes in significant modules regulated the “IL-17 signaling pathway,” “Tumor Necrosis Factor,” and “NF-kappa B signaling pathway,” all of which are related to immune processes. Notably, similar pathway associations have been previously reported in studies of BRD [9,13]. The target genes and regulatory networks of TFs associated with BRD recurrence were identified, including 25 TFs, of which five key TFs were selected to construct a regulation network. Target genes and TFs with high connective degrees in the regulatory networks are reported to be associated with BRD. Our results indicated that the TFs GABPA, TCF4, ELK1, NR2C2 and ARNT are strongly associated with BRD. In line with our findings, the functions of GABPA gene have been reported in the pathogenesis of leukemia and cancer [68–70]. TCF4 is a key TF involved in the regulation of Wnt/β-catenin signaling during lung development [71]. Elk1 has been shown to increases the inflammatory response and exacerbates lung damage in acute respiratory syndrome by controlling the transcription of Fcgr2b [72]. The expression of the NR2C2 gene, a potential target of miR-378b, was significantly upregulated in lung cancer tissues [73]. In addition, module preservation analysis identified NR2C2 as a hub gene in BRD [63].
BRD is influenced by both genetic and non-genetic factors. Genetic factors include inherited traits such as immune response and stress resilience, which may predispose animals to either susceptibility or resistance to the disease. Non-genetic factors involve environmental and management-related conditions, such as transportation stress, weaning, weather changes, and overcrowding. Given the importance of BRD in cattle, certain genes are expected to show considerable expression changes during the disease process. Previous studies employing ML algorithms in genetic analyses have successfully identified key genes associated with various diseases. Therefore, we applied supervised ML techniques, which enabled the accurate identification of key genes in cattle affected by BRD. Our analysis revealed that attribute weighting algorithms consistently highlighted the following genes: CFB, CLU, STAT2, NLRC5, MOCS1, IL15RA, CRYL1, SFXN3, ADAMTSL4, AARS1, ALPK1, and ACVRL1. These genes are primarily involved in inflammation, immune system function, and cellular proliferation and differentiation. Some of the studies included in this analysis were challenge experiments in which BRD was experimentally induced. Therefore, findings related to innate susceptibility to BRD were interpreted with caution. Among the identified genes, CFB, which regulates cellular senescence, was observed to be upregulated, consistent with previous reports of its role as an immune-related gene during BRD [28]. As indicated by the DT model, CLU functions as a tumor suppressor in the early stages of carcinogenesis, and has been suggested as a relevant gene in lung cancer [74]. STAT2 plays a critical role in immune responses and acts as a TF regulating the expression of numerous genes involved in host defense mechanisms during BRD [12]. MOCS1 participates in reactions that produce an essential cofactor and has been identified as a candidate causative mutation for bovine disease [75]. Additionally, our findings confirmed differential expression of IL15RA, CRYL1, and ALPK1 during BRD. IL15RA was identified as a potential marker for genetic resistance to infection in cattle [76]. The CRYL1 gene product catalyzes the dehydrogenation of L-gulonate into dehydro-L-gulonate, and its increased expression has been related to heat stress in dairy cows [77], suggesting a possible connection between metabolic stress and BRD development. ALPK1 triggers activation of the inflammatory NF-κB signaling pathway and plays a vital role in the pathogenesis of lung cancer [78]. Through bioinformatics analysis of RNA-sequencing data, we successfully identified key genes associated with BRD. Furthermore, we developed a diagnostic model using a supervised ML algorithm to predict potential molecular markers for BRD. This study provides valuable insights into the molecular mechanisms of BRD and contributes to the identification of potential diagnostic markers.
Conclusions
The integrative systems biology analysis conducted in this study revealed several key metabolic processes and signaling pathways involved in BRD. Notably, critical target genes were identified and used to construct regulatory networks with TFs. By employing a supervised ML algorithm in combination with gene clustering approaches, this study provides comprehensive insights into significant genes, immune system components, and molecular functions associated with BRD. These findings not only enhance our understanding of the underlying biological mechanisms but also establish a valuable foundation for future research aimed at clinical diagnosis, biomarker identification, and the discovery of novel therapeutic targets.
References
- 1. Scott MA, Woolums AR, Swiderski CE, Perkins AD, Nanduri B. Genes and regulatory mechanisms associated with experimentally-induced bovine respiratory disease identified using supervised machine learning methodology. Sci Rep. 2021;11(1):22916. pmid:34824337
- 2. McGill JL, Sacco RE. The Immunology of Bovine Respiratory Disease: Recent Advancements. Vet Clin North Am Food Anim Pract. 2020;36(2):333–48. pmid:32327252
- 3. Fulton RW. Bovine respiratory disease research (1983–2009). Animal health research reviews. 2009;10(2):131–9.
- 4. Woolums AR, Karisch BB, Frye JG, Epperson W, Smith DR, Blanton J Jr, et al. Multidrug resistant Mannheimia haemolytica isolated from high-risk beef stocker cattle after antimicrobial metaphylaxis and treatment for bovine respiratory disease. Vet Microbiol. 2018;221:143–52. pmid:29981701
- 5. Klima CL, Holman DB, Ralston BJ, Stanford K, Zaheer R, Alexander TW, et al. Lower Respiratory Tract Microbiome and Resistome of Bovine Respiratory Disease Mortalities. Microb Ecol. 2019;78(2):446–56. pmid:30918994
- 6. Delabouglise A, James A, Valarcher J-F, Hagglünd S, Raboisson D, Rushton J. Linking disease epidemiology and livestock productivity: The case of bovine respiratory disease in France. PLoS One. 2017;12(12):e0189090. pmid:29206855
- 7. Timurkan MO, Aydin H, Sait A. Identification and molecular characterisation of bovine parainfluenza virus-3 and bovine respiratory syncytial virus-first report from Turkey. Journal of Veterinary Research. 2019;63(2):167.
- 8. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20(11):631–56. pmid:31341269
- 9. Hong M, Tao S, Zhang L, Diao L-T, Huang X, Huang S, et al. RNA sequencing: new technologies and applications in cancer research. J Hematol Oncol. 2020;13(1):166. pmid:33276803
- 10.
Okonechnikov K. High-throughput RNA sequencing: a step forward in transcriptome analysis. 2016.
- 11. Johnston D, Earley B, McCabe MS, Lemon K, Duffy C, McMenamy M, et al. Experimental challenge with bovine respiratory syncytial virus in dairy calves: bronchial lymph node transcriptome response. Sci Rep. 2019;9(1):14736. pmid:31611566
- 12. Sun HZ, Srithayakumar V, Jiminez J, Jin W, Hosseini A, Raszek M, et al. Longitudinal blood transcriptomic analysis to identify molecular regulatory patterns of bovine respiratory disease in beef cattle. Genomics. 2020;112(6):3968–77.
- 13. Cao H, Fang C, Liu L-L, Farnir F, Liu W-J. Identification of Susceptibility Genes Underlying Bovine Respiratory Disease in Xinjiang Brown Cattle Based on DNA Methylation. Int J Mol Sci. 2024;25(9):4928. pmid:38732144
- 14. Panahi B, Hejazi MA. Weighted gene co-expression network analysis of the salt-responsive transcriptomes reveals novel hub genes in green halophytic microalgae Dunaliella salina. Sci Rep. 2021;11(1):1607. pmid:33452393
- 15. Haidich AB. Meta-analysis in medical research. Hippokratia. 2010;14(Suppl 1):29–37. pmid:21487488
- 16. O’Connor AM, Hu D, Totton SC, Scott N, Winder CB, Wang B, et al. A systematic review and network meta-analysis of injectable antibiotic options for the control of bovine respiratory disease in the first 45 days post arrival at the feedlot. Anim Health Res Rev. 2019;20(2):163–81. pmid:32081117
- 17. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. pmid:19114008
- 18. Sheng H, Zhang J, Shi X, Zhang L, Yao D, Zhang P, et al. Identification of diagnostic biomarkers of and immune cell infiltration analysis in bovine respiratory disease. Front Vet Sci. 2025;12:1556676. pmid:40110435
- 19. Garcia-Moreno A, López-Domínguez R, Villatoro-García JA, Ramirez-Mena A, Aparicio-Puerta E, Hackenberg M, et al. Functional Enrichment Analysis of Regulatory Elements. Biomedicines. 2022;10(3):590. pmid:35327392
- 20. Reimand J, Kull M, Peterson H, Hansen J, Vilo J. g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 2007;35(Web Server issue):W193-200. pmid:17478515
- 21. Kulkarni SR, Vaneechoutte D, Van de Velde J, Vandepoele K. TF2Network: predicting transcription factor regulators and gene regulatory networks in Arabidopsis using publicly available binding site information. Nucleic Acids Res. 2018;46(6):e31. pmid:29272447
- 22. Bin Y, Peng R, Lee Y, Lee Z, Liu Y. Efficacy of Xuebijing injection on pulmonary ventilation improvement in acute pancreatitis: a systematic review and meta-analysis. Front Pharmacol. 2025;16:1549419. pmid:40308770
- 23. Yearley AG, Goedmakers CMW, Panahi A, Doucette J, Rana A, Ranganathan K, et al. FDA-approved machine learning algorithms in neuroradiology: A systematic review of the current evidence for approval. Artif Intell Med. 2023;143:102607. pmid:37673576
- 24. Palmer D, Fabris F, Doherty A, Freitas AA, de Magalhães JP. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY). 2021;13(3):3313–41. pmid:33611312
- 25. Wang C, Xue W, Zhang H, Fu Y. Identification of candidate genes encoding tumor-specific neoantigens in early- and late-stage colon adenocarcinoma. Aging (Albany NY). 2021;13(3):4024–44. pmid:33428592
- 26. Zhang G, Song C, Yin M, Liu L, Zhang Y, Li Y, et al. TRAPT: a multi-stage fused deep learning framework for predicting transcriptional regulators based on large-scale epigenomic data. Nat Commun. 2025;16(1):3611. pmid:40240358
- 27. Johnston D, Earley B, McCabe MS, Kim J, Taylor JF, Lemon K, et al. Messenger RNA biomarkers of Bovine Respiratory Syncytial Virus infection in the whole blood of dairy calves. Sci Rep. 2021;11(1):9392. pmid:33931718
- 28. Jiminez J, Timsit E, Orsel K, van der Meer F, Guan LL, Plastow G. Whole-Blood Transcriptome Analysis of Feedlot Cattle With and Without Bovine Respiratory Disease. Front Genet. 2021;12:627623. pmid:33763112
- 29. O’Donoghue S, Earley B, Johnston D, McCabe MS, Kim JW, Taylor JF, et al. Whole blood transcriptome analysis in dairy calves experimentally challenged with bovine herpesvirus 1 (BoHV-1) and comparison to a bovine respiratory syncytial virus (BRSV) challenge. Front Genet. 2023;14:1092877. pmid:36873940
- 30. Li J, Mukiibi R, Jiminez J, Wang Z, Akanno EC, Timsit E, et al. Applying multi-omics data to study the genetic background of bovine respiratory disease infection in feedlot crossbred cattle. Front Genet. 2022;13:1046192. pmid:36579334
- 31. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. pmid:25516281
- 32. Anders S. Analysing RNA-Seq data with the DESeq package. Mol Biol. 2010;43(4):1–17.
- 33. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1995;57(1):289–300.
- 34. DerSimonian R, Laird N. Meta-analysis in clinical trials revisited. Contemp Clin Trials. 2015;45(Pt A):139–45. pmid:26343745
- 35. Zeggini E, Ioannidis JPA. Meta-analysis in genome-wide association studies. Pharmacogenomics. 2009;10(2):191–201. pmid:19207020
- 36.
Sutton AJ. Methods for meta-analysis in medical research. Chichester: Wiley. 2000.
- 37. Rau A, Marot G, Jaffrézic F. Differential meta-analysis of RNA-seq data from multiple studies. BMC Bioinformatics. 2014;15:91. pmid:24678608
- 38. Ferrari C, Mutwil M. Gene expression analysis of Cyanophora paradoxa reveals conserved abiotic stress responses between basal algae and flowering plants. New Phytol. 2020;225(4):1562–77. pmid:31602652
- 39. Botía JA, Vandrovcova J, Forabosco P, Guelfi S, D’Sa K, United Kingdom Brain Expression Consortium, et al. An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst Biol. 2017;11(1):47. pmid:28403906
- 40. Asselstine V, Miglior F, Suárez-Vega A, Fonseca PAS, Mallard B, Karrow N, et al. Genetic mechanisms regulating the host response during mastitis. J Dairy Sci. 2019;102(10):9043–59. pmid:31421890
- 41. Panahi B. Global transcriptome analysis identifies critical functional modules associated with multiple abiotic stress responses in microalgae Chromochloris zofingiensis. PLoS One. 2024;19(8):e0307248. pmid:39172989
- 42.
Nasiriany S. A comprehensive guide to machine learning. Berkeley: Department of Electrical Engineering and Computer Sciences University of California. 2019.
- 43. Farhadian M, Rafat SA, Hasanpur K, Ebrahimi M, Ebrahimie E. Cross-Species Meta-Analysis of Transcriptomic Data in Combination With Supervised Machine Learning Models Identifies the Common Gene Signature of Lactation Process. Front Genet. 2018;9:235. pmid:30050559
- 44.
Campbell J. Overview of Bovine Respiratory Disease Complex. Merck Veterinary Manual. 2022.
- 45. Sidhar SK, Clark J, Gill S, Hamoudi R, Crew AJ, Gwilliam R, et al. The t(X;1)(p11.2;q21.2) translocation in papillary renal cell carcinoma fuses a novel gene PRCC to the TFE3 transcription factor gene. Hum Mol Genet. 1996;5(9):1333–8. pmid:8872474
- 46. Jang SH, Jiang Y, Shin S, Jung SH, Jung CK, Chung YJ. Potential Oncogenic Role of the Papillary Renal Cell Carcinoma Gene in Non-Small Cell Lung Cancers. Yonsei Med J. 2019;60(4):326–35. pmid:30900418
- 47. Chen L, Shi C, Zhou G, Yang X, Xiong Z, Ma X, et al. Genome-wide exploration of a pyroptosis-related gene module along with immune cell infiltration patterns in bronchopulmonary dysplasia. Front Genet. 2023;13:1074723. pmid:36685920
- 48. Coyne AN, Baskerville V, Zaepfel BL, Dickson DW, Rigo F, Bennett F, et al. Nuclear accumulation of CHMP7 initiates nuclear pore complex injury and subsequent TDP-43 dysfunction in sporadic and familial ALS. Sci Transl Med. 2021;13(604):eabe1923. pmid:34321318
- 49. Wang L, Zhou L, Zhang H, Li Y, Ge X, Guo X, et al. Interactome profile of the host cellular proteins and the nonstructural protein 2 of porcine reproductive and respiratory syndrome virus. PLoS One. 2014;9(6):e99176. pmid:24901321
- 50. Lu T, Li Y, Pan M, Yu D, Wang Z, Liu C, et al. TBC1D14 inhibits autophagy to suppress lymph node metastasis in head and neck squamous cell carcinoma by downregulating macrophage erythroblast attacher. Int J Biol Sci. 2022;18(5):1795–812. pmid:35342354
- 51. Shilatifard A, Lane WS, Jackson KW, Conaway RC, Conaway JW. An RNA polymerase II elongation factor encoded by the human ELL gene. Science. 1996;271(5257):1873–6. pmid:8596958
- 52. Yu H, He K, Li L, Sun L, Tang F, Li R, et al. Deletion of STK40 protein in mice causes respiratory failure and death at birth. J Biol Chem. 2013;288(8):5342–52. pmid:23293024
- 53. Trindade BC, Chen GY. NOD1 and NOD2 in inflammatory and infectious diseases. Immunol Rev. 2020;297(1):139–61. pmid:32677123
- 54. da Silva Barcelos L, Ford AK, Frühauf MI, Botton NY, Fischer G, Maggioli MF. Interactions between bovine respiratory syncytial virus and cattle: aspects of pathogenesis and immunity. Viruses. 2024;16(11):1753.
- 55. Li Y, Wang D, Li X, Shao Y, He Y, Yu H, et al. MiR-199a-5p suppresses non-small cell lung cancer via targeting MAP3K11. J Cancer. 2019;10(11):2472–9. pmid:31258753
- 56. Liu H, Wang B, Zhang J, Zhang S, Wang Y, Zhang J, et al. A novel lnc-PCF promotes the proliferation of TGF-β1-activated epithelial cells by targeting miR-344a-5p to regulate map3k11 in pulmonary fibrosis. Cell Death Dis. 2017;8(10):e3137. pmid:29072702
- 57. Zakrzewska A, Cui C, Stockhammer OW, Benard EL, Spaink HP, Meijer AH. Macrophage-specific gene functions in Spi1-directed innate immunity. Blood. 2010;116(3):e1-11. pmid:20424185
- 58. Xia Z, Rong X, Dai Z, Zhou D. Identification of Novel Prognostic Biomarkers Relevant to Immune Infiltration in Lung Adenocarcinoma. Front Genet. 2022;13:863796. pmid:35571056
- 59. Abdelhafez N, Aladsani A, Alkharafi L, Al-Bustan S. Association of selected gene variants with nonsyndromic orofacial clefts in Kuwait. Gene. 2025;934:149028. pmid:39442823
- 60. Voth W, Schick M, Gates S, Li S, Vilardi F, Gostimskaya I, et al. The protein targeting factor Get3 functions as ATP-independent chaperone under oxidative stress conditions. Mol Cell. 2014;56(1):116–27. pmid:25242142
- 61. Fejzo MS, Anderson L, von Euw EM, Kalous O, Avliyakulov NK, Haykinson MJ, et al. Amplification Target ADRM1: Role as an Oncogene and Therapeutic Target for Ovarian Cancer. Int J Mol Sci. 2013;14(2):3094–109. pmid:23377018
- 62. Jenkins K, Khoo JJ, Sadler A, Piganis R, Wang D, Borg NA, et al. Mitochondrially localised MUL1 is a novel modulator of antiviral signaling. Immunol Cell Biol. 2013;91(4):321–30. pmid:23399697
- 63. Hasankhani A, Bahrami A, Sheybani N, Fatehi F, Abadeh R, Ghaem Maghami Farahani H, et al. Integrated Network Analysis to Identify Key Modules and Potential Hub Genes Involved in Bovine Respiratory Disease: A Systems Biology Approach. Front Genet. 2021;12:753839. pmid:34733317
- 64. Dowling A, Hodgson JC, Schock A, Donachie W, Eckersall PD, Mckendrick IJ. Experimental induction of pneumonic pasteurellosis in calves by intratracheal infection with Pasteurella multocida biotype A:3. Res Vet Sci. 2002;73(1):37–44. pmid:12208105
- 65. Werid GM, Miller D, Hemmatzadeh F, Messele YE, Petrovski K. An overview of the detection of bovine respiratory disease complex pathogens using immunohistochemistry: emerging trends and opportunities. J Vet Diagn Invest. 2024;36(1):12–23. pmid:37982437
- 66. Holschbach CL, Peek SF. Salmonella in Dairy Cattle. Vet Clin North Am Food Anim Pract. 2018;34(1):133–54. pmid:29224803
- 67. Martínez I, Oliveros JC, Cuesta I, de la Barrera J, Ausina V, Casals C, et al. Apoptosis, Toll-like, RIG-I-like and NOD-like Receptors Are Pathways Jointly Induced by Diverse Respiratory Bacterial and Viral Pathogens. Front Microbiol. 2017;8:276. pmid:28298903
- 68. Rosmarin AG, Resendes KK, Yang Z, McMillan JN, Fleming SL. GA-binding protein transcription factor: a review of GABP as an integrator of intracellular signaling and protein-protein interactions. Blood Cells Mol Dis. 2004;32(1):143–54. pmid:14757430
- 69. Sharma NL, Massie CE, Butter F, Mann M, Bon H, Ramos-Montoya A, et al. The ETS family member GABPα modulates androgen receptor signalling and mediates an aggressive phenotype in prostate cancer. Nucleic Acids Res. 2014;42(10):6256–69. pmid:24753418
- 70. Liu T, Zeng J, Zhao X, Fu R, Peng L, Li X, et al. Relationship between vascular aging and left ventricular geometry in patients with obstructive sleep apnea hypopnea syndrome-related hypertension. Sci Rep. 2025;15(1):6191. pmid:39979427
- 71. Tebar M, Destrée O, de Vree WJ, Ten Have-Opbroek AA. Expression of Tcf/Lef and sFrp and localization of beta-catenin in the developing mouse lung. Mech Dev. 2001;109(2):437–40. pmid:11731265
- 72. Wei S, Ling D, Zhong J, Chang R, Ling X, Chen Z, et al. Elk1 enhances inflammatory cell infiltration and exacerbates acute lung injury/acute respiratory distress syndrome by suppressing Fcgr2b transcription. Mol Med. 2024;30(1):53. pmid:38649840
- 73. Wang G, Xu G, Wang W. Long noncoding RNA CDKN2B-AS1 facilitates lung cancer development through regulating miR-378b/NR2C2. OncoTargets and therapy. 2020;10641–9.
- 74. Chen Z, Fan Z, Dou X, Zhou Q, Zeng G, Liu L, et al. Inactivation of tumor suppressor gene Clusterin leads to hyperactivation of TAK1-NF-κB signaling axis in lung cancer cells and denotes a therapeutic opportunity. Theranostics. 2020;10(25):11520–34. pmid:33052230
- 75. Seichter D, Russ I, Förster M, Medugorac I. SNP-based association mapping of Arachnomelia in Fleckvieh cattle. Anim Genet. 2011;42(5):544–7. pmid:21906105
- 76. Ilie DE, Kusza S, Sauer M, Gavojdian D. Genetic characterization of indigenous goat breeds in Romania and Hungary with a special focus on genetic resistance to mastitis and gastrointestinal parasitism based on 40 SNPs. PLoS One. 2018;13(5):e0197051. pmid:29742137
- 77. Koch F, Albrecht D, Görs S, Kuhla B. Jejunal mucosa proteomics unravel metabolic adaptive processes to mild chronic heat stress in dairy cows. Sci Rep. 2021;11(1):12484. pmid:34127774
- 78. Liao H-F, Lee H-H, Chang Y-S, Lin C-L, Liu T-Y, Chen Y-C, et al. Down-regulated and Commonly mutated ALPK1 in Lung and Colorectal Cancers. Sci Rep. 2016;6:27350. pmid:27283888