Figures
Abstract
The fourth leading cause of death in the US, Chronic Obstructive Pulmonary Disease (COPD) is punctuated by frequent viral and bacterial infections causing severe acute exacerbations (AECOPD) and increased mortality. In previous work we have shown that altered immune cell signaling may confer increased and persistent susceptibility to infection. Here we continue this investigation by conducting broad-spectrum proteomic profiling of circulating white blood cells to assemble an empirical protein-protein interaction network associated with frequency of infectious exacerbation. In a novel extension of conventional cross-sectional data analyses, we translate these undirected protein-protein interactions into candidate regulatory relationships with both direction and mode of action. The latter are inferred by formulating and solving a constraint satisfaction problem (SAT) whereby predicted dynamic behaviors of any valid regulatory network must support the expected persistent nature of low and high vulnerability phenotypes. Solving this SAT problem produced a set of competing candidate protein regulatory network architectures and signalling rules that unanimously highlighted several novel candidate pathway elements involved in oxidative stress response. Analysis of the overall dynamics supported by these networks, again supported the hypothesis that progression beyond an immune tipping point may confer persistent susceptibility to infection and that this may constitute a stable phenotype or regulatory trap in COPD characterized by a reactive oxygen cascade.
Citation: Reimer J, Page J, Saha P, Shen S, Zhu X, Qian S, et al. (2025) Leveraging dynamic stability to infer regulation in protein-protein interaction networks: A study of infectious vulnerability in COPD. PLoS One 20(9): e0326062. https://doi.org/10.1371/journal.pone.0326062
Editor: Hong Qin, Old Dominion University, UNITED STATES OF AMERICA
Received: May 28, 2025; Accepted: August 8, 2025; Published: September 5, 2025
Copyright: © 2025 Reimer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data underlying the results and source code used to conduct the analyses may be found as Supporting Information.
Funding: This work was supported by Rochester Regional Health in conjunction with the US Department of Defense Congressionally Directed Medical Research Programs (CDMRP) under Peer Reviewed Medical Research Program (PRMRP) award W81XWH1910804 (Broderick - PI; Sethi - Partnering PI). VIDO receives operational funding from the Canada Foundation for Innovation through the Major Science Initiatives Fund and from the Government of Saskatchewan through Innovation Saskatchewan and the Ministry of Agriculture. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
As our ability to accurately measure broad swaths of the proteome continues to evolve rapidly [1,2], often isolating numerous new proteins that have not yet been annotated. Moreover, even for those proteins that have been annotated, we frequently do not a clear understanding of their function, involvement and role in the broader highly interconnected network of intracellular pathways. To capture the context in which these proteins express, significant efforts have been directed at constructing protein-protein interactions (PPI) maps from experimental data using various measures of association and comparing their structure and connectivity patterns [3]. However, the direction in which a mediator affects a target protein and whether this action activates or inactivates the latter remains challenging to infer limiting current analyses to static comparisons of network structure and connectivity patterns [4]. This is especially true for cross-sectional observational data, which in many cases represents the majority of available data [5]. Numerical strategies have been proposed to generate artificial time course data from cross-sectional data sets that typically involve the re-ordering of pseudo- steady state observations such that they can be supported as an ordered sequence by an underlying forecasting model, e.g., Markov Chain (HM) [6,7]. Extensions of such methods have also been proposed that do not require subjects align along a single trajectory but instead that subjects with proximal profiles simply evolve in a shared direction predicted using a set of stochastic differential equations based on Langevin dynamics [8].
Here, we propose a complementary approach where instead of attempting to describe a trajectory of progression, we focus on homeostatic regulatory stability as a defining characteristic by which directed functional relationships between proteins might be inferred from standard undirected PPI maps. We apply this strategy to cross-sectional protein expression data collected as part of a larger ongoing study directed at identifying the mechanistic underpinnings that drive alternative immune responses in subjects with chronic obstructive pulmonary disease (COPD) that are especially vulnerable to infection. The fourth leading cause of death in the US, the course of COPD is punctuated by frequent acute exacerbations (AECOPD), mostly caused by tracheobronchial mucosal infection by bacteria and/or viruses [9]. Paradoxically, acute and chronic infection in COPD is prevalent despite persistent excessive inflammation and vigorous immune response to the pathogens. A ‘Vicious Circle’, where adaptive defects in innate immune response induced by tobacco smoke allow persistent infection, which in turn perpetuates inflammation and dysregulates innate immune response, has been proposed to explain this paradox [10]. As a potential test of this hypothesis, we investigate the role of soluble protein signaling in the peripheral circulation as an accessory to persistent inflammatory signaling in the mucosa [11] and how this might be used as a network signature of increased vulnerability to infection. In a simplifying modification of the above-mentioned methods, like subjects are expected to progress towards shared one of two stable attractors, namely the persistent phenotypes of low and high vulnerability to infection. Using a one-step discrete difference equation to describe network regulatory dynamics, our analysis suggests that these two phenotypes can be distinguished based on the relative abundance of 15 out of 850 quantifiable protein species and that these phenotypes share a common co-regulatory network structure and state transition logic. As such it may be possible to redirect or de-escalate vulnerability to infection in highly susceptible individuals.
Methods
Study population and sample collection
As part of a larger ongoing study, serum samples were collected from N = 67 subjects with exacerbation history documented for up to 12 years. All subjects were males between the ages of 46–81, with a median exacerbations per year of approximately 1.8 and a median of 7 years of follow up visits (S1 File Suppl. Table S1). Most subjects presented with a GOLD symptom severity score corresponding to moderate (GOLD 2; n = 32) or severe illness (GOLD 3: n = 21). All serum samples were collected during a “well” visit at rest under conditions of stable illness. One subject corresponding to a data entry of 14 exacerbations per year was removed from analysis as a suspected outlier leaving n = 66 protein expression profiles (subject ID 153).
This study is a sub-study of a larger group of patients with COPD and healthy controls to understand biological determinants of exacerbation frequency and was approved by the Institutional Review Boards of the Veterans Affairs Western New York Healthcare System and University at Buffalo. The participants gave the written consent to the study via an IRB-approved consent form. The studies in this work abide by the Declaration of Helsinki principles. The biological samples and data were accessed for research purposes between 01/07/2019 and 01/05/2024. All data was de-identified and the authors did not have access to any information that could identify individual participants during or after data collection.
Ethics statement.
This study is a sub-study of a larger group of patients with COPD and healthy controls to understand biological determinants of exacerbation frequency and was approved by the Institutional Review Boards of the Veterans Affairs Western New York Healthcare System and University at Buffalo. The participants gave the written consent to the study via an IRB-approved consent form. The participants gave the written consent to the study via an IRB-approved consent form. The studies in this work abide by the Declaration of Helsinki principles.
Broad-spectrum LC-MS in serum
Bio-banked samples were transported to the proteomics core facility in liquid nitrogen and stored under −80°C until analysis. Prior to analysis in duplicate by the IonStar LC-MS procedure [1], samples were treated with a detergent-cocktail buffer (0.5% sodium deoxycholate, 2% SDS, 2% IGEPAL® CA-630 and protease/phosphatase inhibitor cocktail) in order to thoroughly denature the proteins and to digest efficiently the samples, followed by a surfactant-aided/precipitation-on-pellet-digestion (SOD) method to achieve quantitative and efficient recovery of peptides [2]. To achieve in-depth profiling and accurate peptide ion-current quantification, digests were separated on a 100-cm-long column with 2-µm-particles by an ultra-high-pressure chromatographic setup with optimized to deliver <15% variation of peptide signal strength [12,13]. An Orbitrap LUMOS ultra-high-field and high-resolution mass spectrometer was used to acquire quantitative ion-current signal and for protein identification. Individual data files were queried against the Swiss-Prot human protein database using the MSGF+ package. A highly stringent set of criteria (e.g., < 1% protein FDR and >2 peptides per protein) was employed to ensure detection of small (as low as 20%) changes in low abundance mediators with high confidence while also maintaining excellent depth and sensitivity (>6 orders of magnitudes in protein abundance). Using this strategy has been shown to routinely quantify 5000–6000 proteins (benchmarked in human cell lysates) in up to 150 biological samples in one batch with <1% missing data, < 10% quantitative variation (technical replicates) and <5% false-positive biomarker discovery rate benchmarked against an experimental null method described previously [1]. The original IonStar LC-MS data can be found in Supporting Information S2 File.
Protein interaction network assembly
A protein-protein interaction network was computed and reduced to the component sub-network most relevant to the clinical phenotype according to the steps described in Fig 1. To eliminate bias, the abundance values were normalized to unit range for each species independently (Step 1). Interactions between range-adjusted protein species was estimated using mutual information (MI), based on equal bin width, as a more sensitive measure of association. Null model or background MI values were generated concurrently by a random shuffling of abundance values for each species repeated 50 times. At each iteration random background pairwise MI values were generated and stored as a null distribution to be used for significance testing (Step 2). Estimated MI values describing protein-protein interactions need not only be significant compared to random result but should also offer a signal of sufficient magnitude to be meaningful when compared to a non-random background. Here we apply Otsu’s algorithm [14], typically used in image analysis, to identify a threshold MI value above which a protein-protein interaction can be considered foreground, and below which they can be considered background in the context of this data (Step 3). As a sanity check, the significance of all foreground MI values compared to a random result are verified using a Bonferroni correction for multiple comparisons enforcing a false discovery rate of less than 1% (FDR < 0.01) (Step 4). Finally all statistically significant foreground values of MI were also required to be equal or greater in magnitude than twice the average random MI (signal to noise ratio ≥2) (Step 5).
The network formed by this subset of only the most important and significant protein-protein interactions was further pruned to include only those proteins most relevant to the clinical variable of interest, namely exacerbation frequency. Proteins indirectly related to exacerbation with a path length exceeding 2 were pruned at this stage of analysis. In other words, proteins that were first and second degree neighbors of exacerbation were retained (Step 6). Although, directionality of these relationships is unknown at this step, protein nodes in this subnetwork with a node degree of 1, cannot in principle be part of a closed loop. As we are focused here on subnetworks supporting closed loop regulatory behavior, only protein nodes with a node degree of 2 or greater were conserved (Step 7). All analyses were conducted using Python 3.12.3 libraries. Estimates of MI were computed using the feature_selection.mutual_info_regression routine in the scikit-learn library, with network analyses conducted using NetworkX 3.3. All gene ontology (GO) annotation and enrichment analyses were performed using the Panther database and analysis tools [15].
Identifying molecular phenotype for infectious risk
The existence of a molecular signature in the peripheral blood that might be indicative of a predisposition to repeated infections was explored by examining similarities between subjects in terms of the joint expression of these 15 co-regulating proteins (Fig 2). We applied a spectral clustering technique which inherently accounts for the co-expression relationships between these 15 proteins to identify 2 possible groups among the N = 66 subjects. Specifically, cluster identification was conducted using cluster.SpectralClustering routine from the sklearn Python library, specifying that 2 clusters be identified in a latent space of 2 eigenvectors. The Calinski-Harabasz score was used as a measure of quality of separation between groups with respect to within-group dispersion. A distribution of Calinski-Harabasz scores was also computed as a null model by assessing clusters generated by repeated random labeling of subjects. Protein co-expression profiles proposed as representative of each exacerbation phenotype were selected as those corresponding to the subject from each group with the expression profile nearest to that cluster’s centroid. Once again, all computations were conducted using Python 3.12.3.
Inferring directed protein-protein co-regulatory actions
The protein-protein interactions shown in Fig 2 are based on a measure of association and as such are devoid of direction (source to target) and mode of action (activation or inactivation of the target node by the source node). In previous work by our group we demonstrated a method for inferring directionality and mode of action by requiring that information flow through the network enable specific observed and expected dynamic behaviors [16]. First we define at apply to each node a state transition logic whereby the next activation state of a node si (t + 1) is determined by its current state and its next logical target state or image si*(t + 1) as determined by the current states and actions of its upstream neighbors (Eq. 1(a)). In the current work all activation level of all node species is expressed as a ternary logic variable si (t) ∈ [0,1,2] describing low, medium, or high activation levels. The distinct and sometimes competing actions of upstream neighbors, activated above their respective perception thresholds, are combined using a simple piecewise linear function. In this way, the actions of weak inactivators are weighed against those of strong activators, and vice versa, to recommend an increase or decrease in the activation of the target node be applied in the next iteration [17]. How quickly a node state actually achieves its target value is dependent on the update scheme and the maximum incremental change allowed.
The direction (source to target), mode of action pj,i ∈ P (activating or inactivating the target) and the perception threshold tj,i ∈ T for each network interaction and decisional weights wj,i ∈ W dictating each node’s state transition are determined by formulating and solving a Constraint Satisfaction (SAT) problem [18,19]. As a simplification, note that the decay of a node species si is represented implicitly by setting a basal state of zero. That is to say that should all upstream activators of a node assume a state of zero, the desired target state for that node will become zero. This parameter search problem was encoded using the open-source Constraint Programming and Modeling in Python (CPMpy) library [20] to invoke the CP-SAT solver [21] in the Google OR-Tools library [22]. CP-SAT a hybrid approach that combines finite domain propagation in Constraint Programming (CP) with the efficiency of Boolean Satisfiability (SAT) solvers. The steady state condition is articulated here as a constraint whereby the image or next target state at time t + 1 is identical to the current state at time t. Additional constraints described in Table 1 include network structural considerations whereby we control for the formation of source (without upstream regulators) and sink (without any downstream target) nodes as well as nodes that are devoid of upstream activators and would invariably be downregulated to their floor state (Table 1). An example network structure and expression profile definition file can be found in Supporting Information S3 File. The Python source code used to parse the network definition file, state and solve the constraint satisfaction problem can be found in Supporting Information S4 File.
Partial validation of predicted regulatory actions
In an attempt to provide a partial validation of predicted regulatory actions linking protein species in an undirected protein-protein interaction network, we conducted searches of the Elsevier Knowledge Graph database (Elsevier, Amsterdam) [23] using the EmBiology software interface and tools. This database relies on a standardized ontology where groups of synonyms describing 1.4 million biological entities are connected by 15.7 million relationships [24]. These are extracted from peer-reviewed literature using the MedScan natural language processing (NLP) engine [25,26] across multiple sources, including over 34.5 million PubMed abstracts, 430,000 clinical trials, full-text articles from 936 Elsevier journals and 939 non-Elsevier journals, as well as databases including over 200,000 entries from BioGRID, 10,000 from DrugBank and 1.3 million Reaxys drug-target relationships. The basic recognition rules and entity terminology in this ontology are updated annually with new relationships extracted from recently published PubMed. Extracted relationships are interpreted with the source exercising Unknown, Positive, Negative, or Undefined effects (control actions) on the target according to standardized functional processes, such as Direct Regulation, Regulation, Expression, Protein Modification, State Change, Quantitative Change, Molecular Transport, and others.
Results
A phenotypically relevant protein co-expression space
Applying the sequential selection steps described in Fig 1 we found a candidate regulatory subnetwork consisting of 15 proteins linked by 25 significant and impactful undirected interactions (~12% connection density) (Fig 2). The strength of protein-protein associations varied between a mutual information of 0.11 and 0.54, with a median association strength of 0.16 supporting a significance of p < 0.05 compared to a null model of randomly sorted values (S1 File Suppl. Table S2). Of these 15 proteins, 13 were functionally annotated in Gene Ontology (GO) (Gene Ontology Consortium; release date 2024-01-17, https://doi.org/10.5281/zenodo.10536401) (S1 File Suppl. Table S3). Subsets of genes corresponding to these proteins were significantly over-represented in GO Biological Function classes corresponding almost unanimously to detoxification of hydrogen peroxide and cellular response stress induced by reactive oxygen species (ROS) (S1 File Suppl. Table S4(a)). Likewise, pathways curated in the Reactome database [27] that were most significantly enriched corresponded to detoxification of ROS (R-HSA-3299685) and response to chemical stress (R-HSA-9711123)(S1 File Suppl. Table S4(b)).
Network-informed phenotypic groups
Protein abundance values for each subject were mapped onto an ordinal qualitative scale of Low = 0, Nominal = 1 and High = 2 for each species independently (S1 File Suppl. Table S5). Spectral clustering of subjects in this discrete qualitative co-expression space of 15 proteins delineated two severity groups separated by a Calinski-Harabasz score of 15.97 (p<< 0.001) (Fig 3). The first group consisted of n = 31 subjects with an average exacerbation frequency of ~2.5 episodes per year whereas the second group consisting of n = 35 subjects presented with ~1.5 exacerbation episodes per year (Fig 4). Though average exacerbations differed significantly (p < 0.01), the noticeable overlap in exacerbation frequencies experienced by individuals assigned to these distinct protein co-expression classes is a reminder of the important contribution of environmental factors to the clinical outcome. Subject 3 (average of 1.47 exacerbations/year) and subject 11 (average of 2.49 exacerbations/year) were identified as representative subjects for each vulnerability group by computing the centroid of each cluster and selecting the subject with the most proximal protein abundance profile. The most important differences separating the profiles for these representative subjects consist of changes in the expression of 3 proteins. Specifically, expression of catalase (EC 1.11.1.6), and delta-aminolevulinic acid dehydratase (ALADH) (EC 4.2.1.24) (Porphobilinogen synthase), were both increased in the higher risk group, while expression of D-dopachrome decarboxylase (EC 4.1.1.84) was decreased (Fig 5). In contrast, proteins involved in innate immunity were not expressed differently across groups, e.g., Natural killer cell-enhancing factor A and B, Macrophage receptor with collagenous structure, etc…(S1 File Suppl. Table S3).
Red dots represent n = 31 subjects with an average exacerbation frequency of ~2.5 episodes per year, whereas purple dots n = 35 subjects with ~1.5 exacerbation episodes per year. Clusters are separated by a Calinski-Harabasz score of 15.97 (p<<0.001 compared to a mean score of ~0.95 for 100 random labeling assignments).
Robust inference of characteristic regulatory control actions
As each of the 25 undirected protein-protein interactions was translated into a pair of opposing directed network edges, the original network architecture consisted of 50 candidate directed edges of unknown mode of action (positive or negative polarity), unknown detection threshold and unknown regulatory weight. We applied the constraints summarized in Table 1 to a solution space spanning over ~1055 possible combinations of parameter settings. As might be expected in such a large underdetermined parameter space, solving the corresponding SAT parameter estimation problem resulted in a large population of competing models describing low and high exacerbation profiles as persistent phenotypes equally well. However in examining a sub-sample of 100 such models we found these to be highly conserved across majority of their structure and decisional logic. Indeed, of the 150 parameters describing network structure and decisional logic, 143 are assigned the same settings in over 80 of the 100 competing models examined here, with 90 such parameter values being agreed upon unanimously (Fig 6). Details of the divergence in parameter values is presented as a phylogenetic tree diagram in supplemental figure S1 Fig. (S1 File Suppl. Table S6). A network model representative of the overall solution set could be defined as that model that is closest to the centroid of all 100 models in the parameter space. Such a representative model is presented in Fig 7, where green-colored interactions terminating in a delta arrowhead represent activation of a target by a source mediator. Likewise, red-colored interactions terminating in a T-bar indicate inactivation of a target node by an upstream mediator. An analysis of the various centrality measures for this representative model highlighted P01700 (IGLV1–47), a key factor in antibody antigen recognition, and P30046 (DDT), an inhibitor of macrophage migration, as having the highest and second highest betweenness centralities. Accordingly, their role as key information brokers in this network suggests dysregulation in immune recognition and innate immune response may play a role in facilitating increased vulnerability to infection in this population (S1 File Suppl. Table S7).
Parameter values describing network structure and regulatory logic are broadly conserved in over 80% of these models. The least consistent predictions involve possible reciprocal co-regulation of TGM3 (Q08188) and TGOLN2 (Q43493)(relationship 35), so far undocumented.
Green arrows (and red T bars) represent activation (or inactivation) of a target node by an upstream mediator, with line width being proportional to the decision weight (thin w = 1, thick w = 2).
In a first partial verification of predicted regulatory relationships, we used the INDRA programming library [28] to interrogate the BEL Large Corpus database as well as the PathwayCommons database [29], itself an aggregate of over 23 molecular pathway databases including KEGG Pathway, Reactome, PANTHER Pathway, BioGRID, and others. Although these manually curated pathway databases did contain relationships linking some of these 15 proteins to other known regulators or targets, none contained documented relationships linking these proteins to each other. Broadening the search to also include relationships extracted from the peer-reviewed literature using automated text mining and archived in the Elsevier Knowledge Graph, we obtain 11 documented relationships linking these 15 proteins, one of which is reported in two functional categories (i.e., Regulation and Expression). The recovery of these literature informed relationships is summarized in Table 2. These 11 documented relationships are supported by a total of 66 peer-reviewed publications, 48 of these confirming a reciprocal regulatory and co-expression relationship between CAT (P04040) and SOD1 (P00441) of indeterminate action. In a majority vote across the current subset of 100 models, 8 of these 11 relationships would have been recovered correctly (~73% recall), with 10 being represented in at least a subset of models. Indeed, only in the case of documented negative co-expressive mediation of PRDX2 (P32119) by CAT (P04040) did all 100 models unanimously disagree with the 2 supporting references, predicting instead that this relationship either doesn’t exist or does not contribute towards explaining our 2 representative protein abundance profiles. Conversely of the 39 potential relationships so far undocumented in the Elsevier Knowledge Graph database, 24 relationships were recruited by majority vote including 17 voted upon unanimously by all 100 models to explain both low and high exacerbation frequency representative profiles as persistent signatures. In a worst-case scenario where all 17 unanimously voted relationships were in fact false positives, this would equate to a positive predictive value (PPV) of roughly 32% (if we consider majority ≥ 98 models as near unanimous). Of the remaining 15 undocumented relationships, the current set of 100 models would concur in a majority vote, including 3 by unanimous vote, that these may be non-existent or at the very least not strictly required to explain the data used here as constraints. Detailed results for this model set are presented in S1 File Supplemental Table S6.
Discussion
While network associations between markers expressed in peripheral blood have shown promise [30] as illness signatures in COPD, these have focused mainly on the identification of transcript sets. Only recently has this extended to an analysis of protein signaling in from frequent and infrequent exacerbators [31]. Though comparable in size to this study, the latter was conducted using plasma samples as opposed to sera which presents a more dilute and more subtle signature. Moreover, the analysis was again limited to the identification of tightly associated sets of differentially abundant proteins with no consideration of network structural characteristics. In this work we propose a simple numerical strategy to infer possible directed and threshold filtered control actions in an undirected protein-protein interaction network by applying expected dynamic behaviors, in this case homeostatic stability, to cross-sectional data. We demonstrate this strategy in an analysis a high-resolution LC-MS broad-spectrum survey of serum samples collected in a larger group of n = 67 participants with stable chronic obstructive pulmonary disease (COPD) with the aim of uncovering illness-mediated changes in protein coregulation that might support increased risk of infection. In previous work by our group, an unsupervised clustering of expression profiles in the over 850 reliably identifiable protein species indicated that a subtle co-expression pattern describing as little as 2% of the variability in a subset of 160 of these proteins could delineate groups of frequent and infrequent exacerbation subjects (Calinski-Harabasz score of 160; p = 0.019). The top 15 contributors to this co-expression pattern pointed to key involvement of heme scavenging as telltale Reactome pathway element [32]. Here we revisit the selection of these candidate marker proteins from the perspective of network biology where we (i) use mutual information (MI) as a more sensitive non-linear measure of interaction, (ii) apply more stringent restrictions on the magnitude and significance of association and (iii) enforce relevance to the clinical outcome based on network proximity. Our results suggest that a small network consisting of 25 interactions linking 15 soluble protein mediators may engage in formally maintaining an altered oxidative stress response in a way that is sympathetic and perhaps mutually supportive of persistent inflammation in the mucosa described in earlier work by our group [11], with both being favorable to frequent infection. Interestingly, though there was no direct overlap in these 15 network-informed protein species with those previously identified using unsupervised clustering [32], both sets support pathway activation that predominantly aligns with detoxification of reactive oxygen species. Lack of agreement in individual markers as opposed to functional sets is a long-observed theme in transcriptomic analysis [33] and has been understood as a direct consequence of the interdependency and partial redundancy of individual molecular species in supporting a common overarching biological function. Interestingly, mediators of hemolysis and detoxification have consistently been found as markers distinguishing exposure induced COPD from autoimmune asthma [34], suggesting therapeutic avenues targeting heme scavenging [35,36]. Moreover, a topological analysis of the most representative set of co-regulatory interactions highlight P01700 (IGLV1–47), a key factor in antibody antigen recognition [37], and P30046 (DDT), an inhibitor of macrophage migration [38], as being key information brokers in a potential dysregulation of immune recognition and innate immune response leading to increased vulnerability to infection in this population. It is important to note however that the important role of sex hormone mediated immune regulation was not addressed in this first study as all participants were male. This important limitation is the object of ongoing work.
Though useful in identifying functional groups [39], protein-protein interaction networks are for the most part undirected and based on associative relationships. As such they do not describe the flow of regulatory signaling. Here we identified two subjects each with a protein expression profile representative of groups enriched in subjects with a higher or lower frequency of infectious exacerbation. As these individuals were examined during stable illness, we proposed that the corresponding protein expression profile might represent a dynamically stable steady state. An earlier analysis of sputum collected in a subset of participants (n = 12) using enzyme-linked immunosorbent assays (ELISA) [11] had suggested that a common immune signaling network of documented regulatory interactions linking 10 cytokines and chemokines was capable of explaining both frequent (≤ 2 exacerbation episodes/year) and infrequent infectious phenotypes. Once again, our analysis of LC-MS protein abundance profiles representative of high and low infectious vulnerability also suggests that these could be explained by the same unifying signaling network and regulatory program. Importantly, by imposing this expected dynamic behavior on the undirected PPI network, we were able to infer not only direction but also a threshold and a mode of regulatory action for 32 of 50 potential interactions based on a majority vote across a subset of 100 competing models (~15% connection density), with 42 regulatory interactions proposed in at least a subset of models (~20% connection density). While larger simulation studies are required to more rigorously assess this approach, it was encouraging to observe that of the 11 relationships documented in the Elsevier Knowledge Graph, a broad literature and database-informed resource, 8 were recovered in a majority vote across 100 competing models with only 1 confirmed false negative. Our previous work with simulated regulatory networks indicated that low false negative rates typically aligned with high false positive rates, with a recall of ~70% typically corresponding to a positive predictive value (PPV) in the order of ~25% [40] when applying some of the more capable reverse engineering methods to perturbation time courses from the DREAM3 challenge [41] in recovering a 10-node network of 25 regulatory interactions. In this limited example, even given the worst-case scenario where all novel predicted relationships were false positives, the results presented here would still be consistent with this performance while relying on cross sectional data only. Given this important distinction, we propose that the integration of expected qualitatively described behaviors have the potential to greatly augment the usefulness of much less costly cross-sectional data sets. This framework also allows for the straightforward inclusion of general topological features as additional universal constraints [42,43], for example our requirement here that any valid network adhere to a closed loop architecture with balanced feedback (no sink nodes, no source nodes, and no strictly unipolar modulation). This has been extended in other work by our group include more complex topological features such as regulatory sub-networks or motifs [44]. Ultimately, by approaching network inference as a constraint satisfaction problem, it is possible to create hybrid initial candidate networks [24] that directly include a priori specific well-documented relationships that further constrain the choice of feasible solutions, with majority agreement in populations of models as opposed to a single unique model having the potential to deliver important improvements in both accuracy and reliability. Moreover, the identification of directed regulatory networks, each capable of supporting multiple phenotypes, is consisted with the observed multi-stability in biology underlying a variety of processes ranging from dynamic coordination in the brain [45] to cell fate decisions [46]. In a departure from the frequent focus on shifts in structure or connectivity patterns separating two separate undirected graphs, each representing a different phenotype [3,4,47], our focus here was to identify a single underlying regulatory biology capable seamlessly explaining both phenotypes as persistent or slowly progressing conditions. Such an overarching network in essence captures conventional phenotype-specific networks as simply the context-specific recruitment and dismissal of constitutive subnetworks. Importantly, the existence of a single common network and regulatory program implies that perturbations may exist that can redirect a permissive immune homeostasis in favor of galvanizing a more robust response to infection and reduced illness severity.
Conclusion
As our ability to broad-spectrum proteomic profiling continues to improve so does the availability of protein-protein interaction network signatures. Such networks are typically extracted from cross-sectional data and as such describe undirected associative rather than source-target co-regulatory relationships. As such, conventional analyses remain limited to a graph-analytical comparison of changes in connectivity patterns in PPI networks that might arise across phenotypic groups. Inferring the source-target direction and the regulatory action of these interactions using conventional reverse engineering methods require longitudinal time course data, often with stringent sampling requirements. Given the high cost typically involved in conducting longitudinal studies, it is useful to consider alternative approaches that might allow us to infer regulatory actions from resting state profiles, if only to provide a more informed basis for designing time course experiments. Here we propose such an alternative approach where we leverage the otherwise limiting resting state observations to ask what regulatory actions would be required of these protein-protein associations to formally satisfy this steady state condition. Though focused on a specific use case, the analysis presented in this work produced candidate regulatory networks that were broadly overlapping in structure and function. Wherever available, support in the peer-reviewed scientific literature of inferred regulatory actions was also highly favourable. Importantly, the proposed method is not intended to compete with proper time course analysis methods but instead offer an extension to current graph analytical methods when only cross-sectional data is available. In this role, we propose that additional investigation of this approach is merited.
Supporting information
S1 Fig. Tree diagram decomposition in the parameter space describing structural and decisional logic similarities and differences in a subset of 100 regulatory network models.
https://doi.org/10.1371/journal.pone.0326062.s001
(TIF)
S1 File. Supplemental Tables S1 – S7
Table S1. Overview of subject characteristics; Table S2. Protein-protein interactions based on mutual information (MI); Table S3. Annotation of 15 proteins of interest with Low (0), Medim (1) or High (2) expression in 2 severity clusters; Table S4(a). Gene Ontology (GO) annotation in Biological Process classes – over-representation analysis of 15 proteins of interest; Table S4(b). Gene Ontology (GO) annotation in Reactome Pathway classes – over-representation analysis of 15 proteins of interest; Table S5. Assignment of subjects into two severity clusters (high frequency = 0; low frequency = 1) based on expression of 15 proteins of interest with Low (0), Medim (1) or High (2) expression; Table S6. Predicted mode of action (polarity) in 100 competing models of a 15 protein co-regulatory network in serum of COPD subjects representative of two exacerbation frequency clusters (activation = +1, absent = 0, inactivation = −1); Table S7. Node metrics for the representative regulatory network of Fig 7.
https://doi.org/10.1371/journal.pone.0326062.s002
(XLSX)
S2 File. Original IonStar serum LC-MS proteomic profiling data.
https://doi.org/10.1371/journal.pone.0326062.s003
(XLSX)
S3 File. Example input network structure and expression profile definition file.
https://doi.org/10.1371/journal.pone.0326062.s004
(XLSX)
S4 File. Python source file for stating and solving the constraint satisfaction problem.
https://doi.org/10.1371/journal.pone.0326062.s005
(PY)
Acknowledgments
The authors thank our veterans and their families for their service and their assistance with this research. Special thanks to the VA Western New York Healthcare System for their continued support of this biorepository (PI Sethi). This article is published with the permission of the Director of VIDO.
Mandatory Disclaimer: The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the Department of Defense.
References
- 1. Shen X, Shen S, Li J, Hu Q, Nie L, Tu C, et al. IonStar enables high-precision, low-missing-data proteomics quantification in large biological cohorts. Proc Natl Acad Sci U S A. 2018;115(21):E4767–76.
- 2. Shen S, An B, Wang X, Hilchey SP, Li J, Cao J, et al. Surfactant Cocktail-Aided Extraction/Precipitation/On-Pellet Digestion Strategy Enables Efficient and Reproducible Sample Preparation for Large-Scale Quantitative Proteomics. Anal Chem. 2018;90(17):10350–9. pmid:30078316
- 3. Athanasios A, Charalampos V, Vasileios T, Ashraf GM. Protein-Protein Interaction (PPI) Network: Recent Advances in Drug Discovery. Curr Drug Metab. 2017;18(1):5–10. pmid:28889796
- 4. Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM Trans Comput Biol Bioinform. 2021;20(1):217–37. pmid:34951849
- 5. Ferraro KF, Kelley-Moore JA. A half century of longitudinal methods in social gerontology: evidence of change in the journal. J Gerontol B Psychol Sci Soc Sci. 2003;58(5):S264–70. pmid:14507936
- 6. Tucker A, Garway-Heath D. The pseudotemporal bootstrap for predicting glaucoma from cross-sectional visual field data. IEEE Trans Inf Technol Biomed. 2009;14(1):79–85. pmid:19527963
- 7. Li Y, Swift S, Tucker A. Modelling and analysing the dynamics of disease progression from cross-sectional studies. J Biomed Inform. 2013;46(2):266–74. pmid:23200810
- 8. Dutta P, Quax R, Crielaard L, Badiali L, Sloot PM. Inferring temporal dynamics from cross-sectional data using Langevin dynamics. Royal Society open science. 2021;8(11):211374.
- 9. Sethi S, Murphy TF. Infection in the pathogenesis and course of chronic obstructive pulmonary disease. N Engl J Med. 2008;359(22):2355–65. pmid:19038881
- 10. Mammen MJ, Sethi S. COPD and the microbiome. Respirology. 2016;21(4):590–9. pmid:26852737
- 11. Morris MC, Richman S, Lyman CA, Qu J, Mammen MJ, Sethi S, et al. Hacking the Immune Response to Infection in Chronic Obstructive Pulmonary Disease. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), 2020. 548–55.
- 12. Nouri-Nigjeh E, Sukumaran S, Tu C, Li J, Shen X, Duan X, et al. Highly multiplexed and reproducible ion-current-based strategy for large-scale quantitative proteomics and the application to protein expression dynamics induced by methylprednisolone in 60 rats. Anal Chem. 2014;86(16):8149–57. pmid:25072516
- 13. Tu C, Sheng Q, Li J, Shen X, Zhang M, Shyr Y, et al. ICan: an optimized ion-current-based quantification procedure with enhanced quantitative accuracy and sensitivity in biomarker discovery. J Proteome Res. 2014;13(12):5888–97. pmid:25285707
- 14. Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans Syst, Man, Cybern. 1979;9(1):62–6.
- 15. Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L-P, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci. 2022;31(1):8–22. pmid:34717010
- 16. Sedghamiz H, Morris M, Craddock TJA, Whitley D, Broderick G. Bio-ModelChecker: Using Bounded Constraint Satisfaction to Seamlessly Integrate Observed Behavior With Prior Knowledge of Biological Networks. Front Bioeng Biotechnol. 2019;7:48. pmid:30972331
- 17. Sedghamiz H, Morris M, Craddock TJA, Whitley D, Broderick G. High-fidelity discrete modeling of the HPA axis: a study of regulatory plasticity in biology. BMC Syst Biol. 2018;12(1):76. pmid:30016990
- 18. Barták R. Constraint Programming: In Pursuit of the Holy Grail. Theoretical Computer Science. 1999;17(12):555–64.
- 19.
Page J, Oh H, Chacko T, Samuel IB, Lu C, Forsten RD, et al. Incorporating Regional Brain Connectivity Profiles into the Inference of Exposure-Related Neurobehavioral Burden in Explosive Ordnance Disposal Veterans. In International Conference on Human-Computer Interaction. Cham: Springer Nature Switzerland. 2024. 121–139.
- 20.
Guns T. Increasing modeling language convenience with a universal n-dimensional array, CPpy as python- embedded example. In: The 18th workshop on Constraint Modelling and Reformulation (ModRef 2019). University of Connecticut: Stanmford. 2019.
- 21. Stuckey PJ. Lazy clause generation: combining the power of SAT and CP (and MIP?) solving. In: Lodi A, Milano M, Toth P. Integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Berlin, Heidelberg: Springer. 2010.
- 22.
Cuvelier T, Didier F, Furnon V, Gay S, Mohajeri S, Perron L. OR-Tools’ Vehicle Routing Solver: A Generic Constraint-Programming Solver with Heuristic Search for Routing Problems. In: Rennes, France, 2023. https://hal.archives-ouvertes.fr/hal-04015496
- 23. Kamdar MR, Stanley CE, Carroll M, Wogulis L, Dowling W, Deus HF, et al. Text Snippets to Corroborate Medical Relations: An Unsupervised Approach using a Knowledge Graph and Embeddings. AMIA Jt Summits Transl Sci Proc. 2020;2020:288–97. pmid:32477648
- 24.
Page J, Moore N, Broderick G. A computational protocol for the knowledge-based assessment and capture of pathologies. Psychoneuroimmunology: Methods and Protocols. New York, NY: Springer US. 2024. 265–84.
- 25. Novichkova S, Egorov S, Daraselia N. MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics. 2003;19(13):1699–706. pmid:12967967
- 26. Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004;20(5):604–11.
- 27. Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, et al. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res. 2024;52(D1):D672–8. pmid:37941124
- 28. Bachman JA, Gyori BM, Sorger PK. Automated assembly of molecular mechanisms at scale from text mining and curated databases. Mol Syst Biol. 2023;19(5):e11325. pmid:36938926
- 29. Rodchenkov I, Babur O, Luna A, Aksoy BA, Wong JV, Fong D, et al. Pathway Commons 2019 Update: integration, analysis and exploration of pathway data. Nucleic Acids Res. 2020;48(D1):D489–97. pmid:31647099
- 30. Obeidat M, Nie Y, Chen V, Shannon CP, Andiappan AK, Lee B, et al. Network-based analysis reveals novel gene signatures in peripheral blood of patients with chronic obstructive pulmonary disease. Respir Res. 2017;18(1):72. pmid:28438154
- 31. Enríquez-Rodríguez CJ, Casadevall C, Faner R, Castro-Costa A, Pascual-Guàrdia S, Seijó L, et al. COPD: systemic proteomic profiles in frequent and infrequent exacerbators. ERJ Open Res. 2024;10(2):00004–2024. pmid:38529348
- 32.
Everingham E. Unsupervised Multi-Resolution Analysis of Protein-Protein Abundance Patterns in Infectious Exacerbation of COPD. Rochester Institute of Technology. 2023.
- 33. Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics. 2005;21(2):171–8. pmid:15308542
- 34. Winter NA, Gibson PG, Fricker M, Simpson JL, Wark PA, McDonald VM. Hemopexin: A Novel Anti-inflammatory Marker for Distinguishing COPD From Asthma. Allergy Asthma Immunol Res. 2021;13(3):450–67. pmid:33733639
- 35. Vallelian F, Buehler PW, Schaer DJ. Hemolysis, free hemoglobin toxicity, and scavenger protein therapeutics. Blood. 2022 Oct 27;140(17):1837–44.
- 36. Yang N, Zhang L, Tian D, Wang P, Men K, Ge Y, et al. Tanshinone increases Hemopexin expression in lung cells and macrophages to protect against cigarette smoke-induced COPD and enhance antiviral responses. Cell Cycle. 2022;22(6):645–65. pmid:36218263
- 37. Huang X, Xiong L, Zhang Y, Peng X, Ba H, Yang P. Proteomic profile of the antibody diversity in circulating extracellular vesicles of lung adenocarcinoma. Sci Rep. 2024;14(1):27953. pmid:39543163
- 38. Merk M, Zierow S, Leng L, Das R, Du X, Schulte W, et al. The D-dopachrome tautomerase (DDT) gene product is a cytokine and functional homolog of macrophage migration inhibitory factor (MIF). Proc Natl Acad Sci U S A. 2011;108(34):E577-85. pmid:21817065
- 39. Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, et al. Using graph theory to analyze biological networks. BioData Min. 2011;4:10. pmid:21527005
- 40. Vashishtha S, Broderick G, Craddock TJA, Fletcher MA, Klimas NG. Inferring Broad Regulatory Biology from Time Course Data: Have We Reached an Upper Bound under Constraints Typical of In Vivo Studies?. PLoS One. 2015;10(5):e0127364. pmid:25984725
- 41. Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One. 2010;5(2):e9202. pmid:20186320
- 42. Greenfield A, Hafemeister C, Bonneau R. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. Bioinformatics. 2013;29(8):1060–7.
- 43. Petri T, Altmann S, Geistlinger L, Zimmer R, Küffner R. Addressing false discoveries in network inference. Bioinformatics. 2015;31(17):2836–43. pmid:25910697
- 44. Page JM, Morris MC, Skuse G, Broderick G. Refinement of Biological Regulatory Graphs using Functional Enrichment. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2022. 3580–6.
- 45. Kelso JAS. Multistability and metastability: understanding dynamic coordination in the brain. Philos Trans R Soc Lond B Biol Sci. 2012;367(1591):906–18. pmid:22371613
- 46. Li C, Wang J. Quantifying cell fate decisions for differentiation and reprogramming of a human stem cell network: landscape and biological paths. PLoS Comput Biol. 2013;9(8):e1003165. pmid:23935477
- 47. Broderick G, Fuite J, Kreitz A, Vernon SD, Klimas N, Fletcher MA. A formal analysis of cytokine networks in chronic fatigue syndrome. Brain, Behavior, and Immunity. 2010;24(7):1209–17.