Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Phenome-wide association study identifies new clinical phenotypes associated with Staphylococcus aureus infections

  • Patrick Allaire,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

  • Noha S. Elsayed,

    Roles Formal analysis, Investigation, Validation

    Affiliation Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

  • Richard L. Berg,

    Roles Formal analysis, Writing – review & editing

    Affiliation Research Computing and Analytics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America

  • Warren Rose,

    Roles Formal analysis, Methodology, Writing – review & editing

    Affiliation School of Pharmacy, University of Wisconsin, Madison, Wisconsin, United States of America

  • Sanjay K. Shukla

    Roles Conceptualization, Investigation

    shukla.sanjay@marshfieldresearch.org

    Affiliations Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States of America, Computational and Informatics in Biology and Medicine Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America

Abstract

Background

Phenome-Wide Association study (PheWAS) is a powerful tool designed to systematically screen clinical observations derived from medical records (phenotypes) for association with a variable of interest. Despite their usefulness, no systematic screening of phenotypes associated with Staphylococcus aureus infections (SAIs) has been done leaving potential novel risk factors or complications undiscovered.

Method and cohorts

We tailored the PheWAS approach into a two-stage screening procedure to identify novel phenotypes correlating with SAIs. The first stage screened for co-occurrence of SAIs with other phenotypes within medical records. In the second stage, significant findings were examined for the correlations between their age of onset with that of SAIs. The PheWAS was implemented using the medical records of 754,401 patients from the Marshfield Clinic Health System. Any novel associations discovered were subsequently validated using datasets from TriNetX and All of Us, encompassing 109,884,571 and 118,538 patients respectively.

Results

Forty-one phenotypes met the significance criteria of a p-value < 3.64e-5 and odds ratios of > 5. Out of these, we classified 23 associations either as risk factors or as complications of SAIs. Three novel associations were discovered and classified either as a risk (long-term use of aspirin) or complications (iron deficiency anemia and anemia of chronic disease). All novel associations were replicated in the TriNetX cohort. In the All of Us cohort, anemia of chronic disease was replicated according to our significance criteria.

Conclusions

The PheWAS of SAIs expands our understanding of SAIs interacting phenotypes. Additionally, the novel two-stage PheWAS approach developed in this study can be applied to examine other disease-disease interactions of interest. Due to the possibility of bias inherent in observational data, the findings of this study require further investigation.

Introduction

Although the Electronic Health Record (EHR) became mainstream in the United States’ healthcare systems in the early 2000s, research using these databases had been somewhat limited in scope. In 2010, a seminal Phenome-Wide Association Study (PheWAS) found that EHR data (e.g., a phenotype) can be screened against a genetic variant to replicate known genomic associations [1]. This has helped understand genetic variant associated phenotypic pleiotropy. Using this PheWAS paradigm, the genetic variant can be replaced by any other variable of interest, for instance clinical phenotypes presented by Staphylococcus aureus infections (SAIs). Using large cohorts with rich longitudinal EHR data, this adaptation of the PheWAS was a powerful opportunity to identify the spectrum of clinical phenotypes associated with SAIs like what has been shown with COVID-19 [2].

Diseases such as bacteremia, endocarditis, and osteomyelitis resulting from S. aureus cause significant morbidity and mortality [3,4]. SAIs (referring to all S. aureus causative diseases) pose a major problem for both inpatient and outpatient settings [5]. For example, the incidence rate of S. aureus bacteremia is up to 65 cases/100,000 patients/year [6]. SAI’s consequences include high mortality, prolonged hospitalization, and excessive healthcare costs [7]. In a 2010–2014 study, both MRSA (methicillin-resistant S. aureus) and MSSA (methicillin-sensitive S. aureus) led to excessive hospitalization costs [8]. Risk factors for SAIs include prolonged hospitalization, surgical procedures, immunocompromised status, type 2 diabetes, and glucocorticoid treatment [912]. An effective management of SAIs cannot be accomplished without a full understanding of all known and yet-to-be-known disease risk factors and downstream disease complications [13]. Given the intricacies inherent in studying patients with SAIs and the sizeable amount data available in modern EHR system, larger and more comprehensive studies can be done, which increases the opportunity to find novel associations with SAI. This, in turn, may improve our understanding of the SAIs.

To explore the SAIs-phenome interaction spectrum, we implemented a two-step PheWAS to identify novel associations using EHR data from a multispecialty Marshfield Clinic Health System (MCHS) in Marshfield, WI. Among many associations found, we identified three new phenotypes associated with SAIs and these associations were reproduced in datasets from the All of Us, a National Institutes of Health Research database and TriNetX, a global health research network.

Materials and methods

Ethic statement—Human subject research

This study utilized data from three cohorts: MCHS, TriNetX, and All of Us Research Project (AoURP). The data from these cohorts was previously de-identified. The authors had no access to any type of data that can potentially identify participants, except for ICD code dates required to establish time correlations between two diseases. This manuscript neither discusses individual-level data nor gives exact group size numbers for those smaller than 20 individuals.

  1. MCHS: The research contained in this article was approved by the institutional review board of the Marshfield Clinic Research Institute, IRB # IRB-18-056 was granted to Sanjay K Shukla Ph.D. on December 12th, 2021 for study number SHU10614. Informed Consent was not required as determined by the MCHS IRB as all the data were analyzed anonymously.
  2. TriNetX: The utilization of data from the TriNetX platform was exempted from requiring ethical approval at the researcher level. This exemption is due to the thorough de-identification process the data undergoes, which has been certified as HIPAA-compliant through expert determination. Since both TriNetX and the All of Us data are de-identified, and the research did not involve any intervention or interaction with living individuals, it is classified as "Not Human Subject" research. For more details on the TriNetX de-identification process, refer to: https://trinetx.com/wp-content/uploads/2022/04/TriNetX-Empirical-Summary-by-Brad-Malin-2020_branded.pdf
  3. All of Us Research Program: Similarly, the use of data from the All of Us Research Program doesn’t require ethical approval at the researcher level, thanks to its comprehensive de-identification procedure. While All of Us has specific criteria, such as studying groups smaller than 20, which would necessitate an IRB approval, our study doesn’t fall within these parameters. More about the All of Us Research Program’s IRB-approved protocol can be accessed here: https://allofus.nih.gov/about/all-us-research-program-protocol

Study cohorts description

The first, discovery cohort consisted of EHR from 754,401 patients form MCHS. Our inclusion criteria included a minimum of 18 years of age and at least 5 years of EHR data defined by any ICD code entry on two separate days. We extracted the dataset for this cohort once on March 20th, 2022. The second cohort, TriNetX, consisted of 82 healthcare organizations for a combined EHR data from 109,884,571 patients. We accessed this data directly from the TriNetX platform on of May 15th, 2023 and used it to calculate odd ratios (see below). Note that due to limitation of TriNetX data access, the initial inclusion criteria for this cohort were only to have all patients be at least 18 years of age with no restriction with length of EHR. To investigate the correlation of age of onset between SAI and other phenotypes, a sub-TriNetX cohort that included only patients with SAI was downloaded by us and included their full, de-identified, EHR records. This allowed to restrict patient eligibility as performed above for the MCHS cohort. This dataset was obtained on March 28th, 2022. The third cohort was derived from All of Us [14] and consisted of 118,538 participants filtered from 413,457 initial participant entries (CDR version C2022Q4R9, May 2023). Of the 413,457 participants, 287,012 had some type of EHR data accessible, and 254,487 have EHR defined by any ICD code entry on two separate days. We restricted the parameters more tightly to participants with five or more years EHR data result. That gave us a cohort of 166,790 individuals, of whom 118,538 had genetically predicted ancestry; these we retained. The TriNetX research network and All of Us cohort were used as a replication cohort for any novel SAIs associations identified with the MCHS cohort. Demographics of the cohorts used in the first step PheWAS are presented in S1.1 Table in S1 Table and those for the second step PheWAS in S1.2 Table in S1 File.

Two steps PheWAS screen

The two steps PheWAS screening was conducted with the MCHS dataset, as outlined in Fig 1, to identify novel associations. The dual PheWAS approach utilizes different input parameters to evaluate associations between SAIs and a phecode (codes that rapidly define case/control status of hundreds of clinical conditions). Briefly, a disease in EHR is defined by ICD-9 and ICD-10 codes. These ICD codes are mapped to 1866 phecodes (extracted from phecode map version 1.2; https://phewascatalog.org/phecodes) [15] for use in the PheWAS. In the first PheWAS, we screened medical records for phecodes that coincided with SAIs phecode, 41.1. We chose Phecode 41.1 as it encompasses all forms of methicillin-sensitive or methicillin-resistant SAIs at any clinical site including blood. Phecode 41.1 also includes the history of the disease as stated elsewhere. Here, a logistic regression model was used where the response variable was SAIs, and the other phenotype was the predictor. The model included age at the last healthcare visit, sex, and ancestry (or race) as covariates. Note that race was used instead of ethnicity in TriNetX as the latter was not available. Basic cohort characteristics for this PheWAS are given in S1.1 Table in S1 Table. Given that SAIs are definitive diagnoses, we only required one occurrence of a phecode to designate a case. Those patients without a SAIs phecode record were labelled as controls. We considered association results only when at least 50 patients were coded for both SAIs and the test phecode [16]. Although we acknowledge the subjective nature of this minimum count, the goal here was to screen the sample size and remove imprecise estimates from consideration.

thumbnail
Fig 1. Two step PheWAS screening approach.

Flow chart of the PheWAS 1 and 2 screening approaches and steps used in PheWAS 1 and PheWAS 2 statistical model.

https://doi.org/10.1371/journal.pone.0303395.g001

This screening left 1373 phecodes tested. All those phecodes which reached Bonferroni adjusted p-value threshold for multiple testing (p<3.64e-5, alpha = 0.05, 1373 tests) and with an odds ratio (OR) of ≥ 5 were carried into the second sequential PheWAS. The high OR selection, which included those over the 75% percentile, reflects our goal to focus only on the most impactful associations, although we understand that less significant but relevant association may be missed.

The second PheWAS tested the relation of age at first onset between SAIs and a phecode. To avoid statistical inflation caused by one-time visits that could lead to aberrant simultaneous coding of SAIs and any other phecode, we excluded any patient where the onsets of SAIs and the phecode were within 24 h (1 day). Age at first SAIs (defined as the first ICD code entry mapped to SAI) was assigned as the response variable and onset of the test phenotype (defined as the first ICD code entry mapped to this phenotype) as a predictor variable. Covariates for adjustment included age, sex, and race at the last visit. Basic cohort characteristics for this PheWAS are given in S1.2 Table in S1 File. In this second screen, we considered only association test results where at least 50 patients presented the phecode within 60 days of onset of SAIs. We retained phecodes reaching Bonferroni adjusted p-value threshold for multiple testing (p<5.05e-4, alpha = 0.05, 99 tests) for further classification either as risk factors or complications of a SAI, based on which condition appeared first.

Next, we removed patients with known risk factors, including: phecodes 197 (chemotherapy), 250 (all type of diabetes), 429.1 (heart transplant/surgery), 510.2 (lung transplant), 573.2 (liver replacement by transplant), 585.32 (end stage renal disease), 851 (complications of transplant and reattached limbs), and 860 (bone marrow or stem cell transplant). Once this removal was complete, we performed the second PheWAS. This order helped account for known risk factors. Statistical models are formally defined in S2 Table. All statistical analyses were performed in R, version 3.6.3 or higher and using routine basic packages.

Replication of novel associations with TriNetX and All of Us

In Table 3, we calculated odds ratios (ORs) using a contingency table because TriNetX web-based analysis platform limits access to all TriNetX EHR data simultaneously. Therefore, these ORs are not adjusted for age, sex, and race. To make for a fair comparison, we also recalculated MCHS ORs using a contingency table and data derived from TriNetX (which includes MCHS) to develop two comparable datasets. The PheWAS of age of first onset uses linear regression models and were performed as described above. Results for All of Us are as detailed as was for TriNetX.

Redundant phecode pruning

Since phecodes within a class are often biologically similar, reporting the whole class may provide redundant information. To limit redundancy, we selected sentinel phenotype for each class of phecodes as the one with the highest ORs. For example, phecode class 250 represents both types of diabetes whereas the biology of type I diabetes (T1D) is different from type II diabetes (T2D) and we made that distinction in sentinel phecode selection. The sub-class phecode 250.1 represents T1D and has six sub-phecodes (250.10 to 250.15). The sub-class phecode 250.2 represents T2D and has also 6 sub-phecodes (250.20 to 250.15). We considered only sub-phecodes with p-value that met the Bonferroni threshold, and within those codes, chose the one with the highest OR as the sentinel phecode. Using T1D as an example, phecodes 250.10 and 250.13 were significant but 250.13 had a higher OR than 250.10. Therefore, only 250.13 was reported and represented the sentinel phecode of that group. A similar approach was used for T2D. If there were no obvious underlying biological distinctions within a phecode class, which was true for most classes, all sub-codes were curated together.

Results

The MCHS cohort, which was utilized as the discovery cohort, consisted of 754,401 patients, 52.4% of whom were female. The average age of the cohort was 55.6 years (SD = 20.7), with an average EHR length of 20.5 (SD = 10.2) years. It is notable that a significant proportion of the MCHS patient population (61.1%) self-identified as white of European descent. However, the ethnicity of a considerable fraction (36.7%) of this EHR population remained unknown (S1.1 Table in S1 Table).

The two steps PheWAS screen identified 41 phecodes associated with SAIs

The two-step PheWAS process with a threshold for flagging associations is illustrated in Fig 1, with details of the individual PheWAS model described in S2 Table. All statistical models were adjusted for age and sex at the last diagnosis. The only exception to this was the manual calculation of ORs performed with TriNetX and the equivalent for MCHS and AoURP in Table 3 (see Methods). The first PheWAS identified 236 associations (S3.1 Table in S3 Table), and these were carried forward to the second PheWAS for age correlation. Due to our criteria, which required at least 50 patients to have their first phecode registered within 60 days of the first SAI recorded, and the removal of individuals coded within 24 hours of an SAI (to avoid potential data inflation from one-visit specialty treatments), only 99 out of the 236 phecodes were eligible for testing in the second PheWAS. After the second screening, 94 phecodes remained significant (S4.1 Table in S4 Table). However, many of these phecodes belonged to the same phecode class, rendering them biologically similar and redundant. To address this, we selected the top association within each phecode class based on its p-value significance and the highest OR (see Materials and methods and S5 Table). We referred to this top association as the ’sentinel phecode’. This selection process reduced the number of significant associations to 41. The summary statistics from the two-step PheWAS for these 41 phecodes are presented in Table 1.

thumbnail
Table 1. Summary statistics of MCHS dual phewas of sais (sentinels).

https://doi.org/10.1371/journal.pone.0303395.t001

Sentinel phecodes associated with SAIs enriches in circulatory system disease category

We classified phecodes into disease categories to give a general sense of their association with biological/physiological pathways [17], (https://phewascatalog.org/). Out of 17 disease categories, 12 phecodes were found to be associated with SAIs (S6 Table). An enrichment of circulatory system phecodes emerged (7/41, or 17% of total phecodes), indicating that endovascular physiology is strategically important for SAIs. This is further supported by the inclusion of five other phecodes categorized under the hematopoietic category, which interacts with the circulatory system through the blood transport of hematopoietic cells. Altogether, these observations suggest that a dysfunctional blood/circulatory system may either provide an opportunity for infection, or that SAIs can perturb the system.

Phecodes associated with SAIs are mainly risk factors

To further unravel how the 41 phecodes were associated with SAIs, we classified them as either a potential risk factor or a complication of SAIs. We counted the number of times the age of first onset of a phecode occurred before (therefore a risk) SAIs versus after (a complication). We counted occurrences (pre-SAIs and post-SAIs) for both lifetime and acute, which we defined as within 60 days of a SAIs onset (Table 2). Using this classification method, out of the 41 phecodes, 24 associations were classified as either risk factors or complications of SAIs, and 17 remained unclassified (referred to as cryptic). Not surprisingly, some clinical identifiers of SAIs appeared as risk factors and represent 17 out of the 23 as shown in Table 2. We carried out a search through both PubMed and Google to identify established or reported associations in the literature. Criteria for labeling an association as established were the presence of more than one confirmatory study or a large patient cohort or a finding with an odds ratio of >2. Criteria for previously reported but less than established associations were as: i) mentioned in a case study or, ii) results from in vitro experimentation, and/or iii) symptoms of established associations like shortness of breath for MRSA pneumonia (Table 2).

thumbnail
Table 2. Phenotype risk versus complication.

Ratio = counts of risk / counts of complication.

https://doi.org/10.1371/journal.pone.0303395.t002

Histograms for two known risk factors compared with our three new associations are displayed in Fig 2. The histogram counts represent the difference between the age of onset of phecode X and SAIs in one-year bins. One new risk factor, long-term aspirin usage, was mainly coded prior to the onset of SAI, similar to known risk factors such as T2D and acute renal failure (Fig 2A/B/E, and Table 2). Additionally, one association that was classified as a complication was novel (iron deficiency anemia due to chronic blood loss) and showed consistent trends with both lifelong and acute categories (Fig 2D and Table 2). Finally, anemia of chronic disease is classified as cryptic (Fig 2C and Table 2) because the lifelong ratio appears as a risk while acute as a complication.

thumbnail
Fig 2. Histogram of known and novel associations.

A) Acute renal failure (known association phecode 585.1). B) Type 2 diabetes (known association phecode 250.25). C) Plasma protein metabolism disorder (novel association, phecode 270.38). D) Aspirin usage (novel association, 457.3). E) Anemia of chronic disease (novel association, phecode 285.2).

https://doi.org/10.1371/journal.pone.0303395.g002

Novel associations are replicated in TriNetX and All of Us

We verified the three novel associations in TriNetX and All of Us by calculating ORs using contingency tables. Inexplicably, ICD code for ‘long-term use of aspirin’ was not available in All of Us and as a result, we could not determine if the association between SAIs and long-term use of aspirin is replicable in All of Us or not. We additionally provided ORs corrected for possible confounding effects from known risk factors (chemotherapy, diabetes, organ transplants, surgeries, end-stage renal disease) (see S2 Table for phecode/ICD10 code usage). Correction was done by removing patients with these risk factors. We observed that ORs and statistical significance are generally maintained between MCHS, TriNetX, and All of Us, although they trended higher in TriNetX and lower in All of Us likely due to the sizes of the cohorts (Table 3). Interestingly, removing risk factors in the MCHS cohort had an impact on the anemia of chronic disease OR, wherein it decreased from 16.99 to 4.07 but remained highly significant. In contrast, OR for both anemia of chronic disease in the TriNetX and All of Us cohort remained constant (Table 3). Consistent with the replicated ORs, the age of first onset correlation also remained significant and mostly similar in effect size in TriNetX (Table 3). While in the MCHS cohort, anemia of chronic disease was found to be cryptic, even after correcting for at-risk patients, TriNetX showed this as a consistent complication of SAIs (Table 3 and Fig 3).

thumbnail
Fig 3. Histograms of difference between age of onset anemia of chronic disease and SAIs.

TriNetX results after removal of patient at risk for SAIs. A) Lifelong count distribution across medical record. B) Acute count distribution over a window of +/- 60 days centered on age of onset of SAIs. Risk phecode include all diabetes, acute renal failure, all transplant, and chemotherapy codes. X axis measures the difference between the ages of first onset of phecode X—ages of first onset of SAIs. X axis is in years for A) and in days for B).

https://doi.org/10.1371/journal.pone.0303395.g003

thumbnail
Table 3. Novel SAIs associations are replicated in the TriNetX and All of US cohort.

https://doi.org/10.1371/journal.pone.0303395.t003

Interestingly, the OR of the "long-term use of aspirin" observation in the MCHS increased to 19.29 from 5.25 after removing patients with known risk factors, whereas in TriNetX the OR decreased to 4.86 from 6.54. The observed cohort-specific trend appears to be due to differences in the frequency of known risk factors, which, according to the data, seem to be considerably higher in the MCHS cohort. Summary statistics for PheWAS of occurrence are shown in S3.2 Table in S2 File, and those of PheWAS of age correlation are shown in S4.2 Table in S3 File. Fig 4 summarizes our findings.

thumbnail
Fig 4. Graphical summary results of the PheWAS of SAIs.

The left panel describes four risk factors categories with examples of two phecodes in each category: 1) Known Risk factors, 2) Clinical identifiers, 3) S. aureus manifestations, and 4) Novel Associations. The right panel describes two novel complications after SAIs.

https://doi.org/10.1371/journal.pone.0303395.g004

Discussion

Today’s ease of access to the EHR systems of large health care organization and centralized EHR data set collection, such as TriNetX and All of Us, combined with sophisticated statistical screening methods, enable researchers to discover disease associations that were previously unachievable. In our study, we used a two-step PheWAS method to pinpoint and categorize health conditions not previously linked to SAIs. We found 41 unique health conditions associated with SAIs, grouped into 12 disease categories. The circulatory system had the largest representation with seven conditions, highlighting its significant role in the health complications caused by SAIs, regardless of the infection source. Three of the health conditions—diabetes, congestive heart failure, and acute renal failure—have already been associated with SAIs, validating our method [18,19]. To partially summarize our results, we classified the phenotypes associated with increased risk of SAIs into four main categories: i) previously identified known risk factors, ii) clinical identifiers, iii) novel associations, and iv) Staphylococcal manifestations. We define “clinical identifier” as a known symptom of SAI that is detected prior to confirmation of an SAI. The novel association category included: a) long-term use of aspirin, b) iron deficiency anemia, and c) anemia of chronic disease.

One advantage of our algorithm was in its ability to identify clinical identifiers as risk factors. For instance, previous studies have noted white blood cell count [20] or symptoms like delirium [21] after sepsis during patient evaluation. In our study, we recorded these indicators before the formal clinical confirmation of a SAI, indicating they can serve as warning signs for risk of infection. Additionally, our study supported the link between two previously suggested clinical identifiers and SAIs: disorders of magnesium metabolism (including both hypomagnesemia and hypermagnesemia) [22] and decubitus ulcers [23]. A deficiency of magnesium may reduce innate host defense against S. aureus, increasing the risk of infection. Notably, even after adjusting for diabetes or renal failure, magnesium imbalance remained a significant factor (see S3.2 Table in S2 File and S4.2 Table in S3 File). Xie and Yang (2016) highlighted the role of magnesium in fighting S. aureus infection, given its antimicrobial action against the bacterial membrane [22]. Decubitus ulcers, common in patients who are immobilized, can lead to various types of infections. It’s known that S. aureus can colonize these wounds [24], which may result in S. aureus bacteremia [23,25]. In line with this, we found cases where decubitus ulcer diagnosis was noted before an SAI incident. Identifying these kinds of associations is crucial, as it can contribute to the prevention and early diagnosis of SAIs, ultimately improving patient care.

Some conditions caused by S. aureus, referred to here as S. aureus manifestations, are typically identified around the time of SAIs diagnosis or shortly thereafter. These include diseases like cellulitis [26], MRSA pneumonia [27], carbuncle [26], and pyogenic arthritis [28] among others. There are also symptoms such as limb swelling, erythema, and edema that are coded before an SAI diagnosis. The seeming discrepancy may be due to the fact that identifying and confirming S. aureus involves a confirmation by lab-based culture, which can cause reporting delays [29].

In our study, long term use of aspirin was determined to be a risk factor for SAI. Known for its anticoagulant properties [30], aspirin use could heighten susceptibility to SAIs. Coagulation is a natural immune response to infection [31], so decreased coagulation might make an SAI more likely. It’s also possible that aspirin is prescribed to patients with abnormal coagulation. We suggest that the driver of SAIs is abnormal coagulation, rather than aspirin use per se, as these patients may already be on aspirin. Interestingly, aspirin use in hemodialysis patients has been reported to lower the risk of SAIs [32]. Alternatively, since SAI may also occur in patients who are chronically ill, aspirin usage may simply just be a marker of chronic illness specifically in patients with e.g., cardiac, vascular, or neurologic diseases as opposed to being directly related to SAI pathogenesis. In future focused studies, aspirin usage could be investigated by accounting for or excluding patients with confounding risk factors as mentioned above.

Our study verified two new associations, both of which were consistently observed in two additional cohorts, bringing the total to three cohorts confirming these novel complications. The first is iron deficiency anemia, which is physiologically linked to SAIs. S. aureus acquires iron from hemoglobin during invasive infections using its iron-regulated surface determinant receptor, IsdB [33]. This process aids in further invasion and persistence of S. aureus in the host, potentially leading to lower iron availability post-SAI. The second complication is anemia of chronic disease, previously reported as a risk factor for an SAI [9]. This condition may result from immune system changes affecting iron homeostasis due to bacterial, parasitic, fungal infections, or even cancer [34]. However, our findings suggest that anemia is more likely a complication than a risk factor. This is supported by Jensen et al., who reported the risk of anemia and hyponatremia following hospital-acquired S. aureus bacteremia [9]. Musher and colleagues also noted anemia preceding pneumococcal pneumonia, including severe cases with bacteremia [35]. Similarly, a mouse model showed S. aureus infection causing leukopenia, lymphopenia, neutrophilia, monocytosis, and microcytosis, the latter of which can lead to anemia [36]. Yet, to our knowledge, there are no existing reports specifying the type of anemia associated with SAIs. Importantly, for anemia of chronic disease, the long-term and acute risk/complication ratios remained low even after adjusting for known risk factors (lifetime 0.67, n = 2426; acute 0.69, n = 608). This suggests that anemia of chronic disease is likely a genuine complication of SAIs. Alternatively, as with long term usage of aspirin, anemia may just be a marker of chronic illness in patients who have previously coincidently been sick with SAI. Further studies will be needed to address this.

In our study, seventeen phecodes associated with SAIs were categorized as cryptic (Table 2) as they showed different lifetime and acute risk/complication ratios. For example, endocarditis is a disease that can result from SAIs in both community and hospital settings, making both associations plausible. For instance, patients might be hospitalized due to primary endocarditis [19] resulting from infections originating in the community (pre-SAI phecode). Alternatively, they might develop endocarditis following a S. aureus bacteremia acquired in the hospital (post-SAI phecode) [37].

Although our PheWAS approach has uncovered novel associations with SAIs, it is not without limitations. First and foremost, we intentionally designed our study to be strictly observational and not to determine cause and effect. As such, it should be regarded as only hypothesis generating. An important limitation of our study (and all studies based on ICD coding) is that there could be inconsistencies and variability in clinicians’ code selection. However, this limitation may be less relevant in our study due to the generalization of the phecode mapping system that clusters similar ICD-9 and ICD-10 codes to one phecode. Our phecode mapping system has its own distinct sets of limitations. One limitation inherent to the design of this study pertains to the simplification of the mapping ICD code to phecode mapping system. This is meant to simplify association testing, as close to 60,000 ICD-9 and ICD-10 codes exist. An immediate flaw of mapping several ICD codes is that it removes information from a treatment standpoint. However, this is not the focus of this study; rather, it is to identify risk factors/complications. Another weakness of the phecode mapping system that pertains to PheWAS of the age of onset is the inclusion of “history” codes. These codes provide an indication that a condition previously occurred with no mention of when or at what age. This could confound the statistics from the PheWAS of the age of first onset if the “history” ICD code occurs frequently. However, we find that, at least for SAI, the usage of the “history” code is infrequent, so it is not likely to significantly impact the results. Another weakness related the phecode system has to do with its categorization into disease categories that may not be fit from an infectious disease point of view. For example, cellulitis, abscess, and decubitus ulcer are all considered "dermatologic" but are different conditions. A similar mis-categorization along those lines is observed with osteomyelitis, MRSA pneumonia, and endocarditis, which are listed under musculoskeletal, respiratory, and circulatory systems, respectively, but not infectious. However, the key point in this disease category is that it reflects how organ systems are affected, and therefore, ones that are more prone to infections. One limitation of our study is the method we used to classify associations as either risk factors or complications. This classification should be considered within the context of the already understood pathophysiology of the disease. Sometimes a complication may appear as a risk factor due to the timing of phenotype or clinical identifier observation (e.g., cellulitis and increased white blood cell count) and coding in EHR. A fourth limitation concerns the generalization of our results to distinct ancestries. Most of the patients in the MCHS are of European ancestry and many have no reported ethnicity but are presumably European given known regional history. Our results are thus not generalizable to all ancestry but reflects that of white/European.

In conclusion, we have developed a unique PheWAS strategy to uncover a range of associations between various phenotypes and SAIs. Our study offers a comprehensive hypotheses catalogue of phenotypes associated with SAIs, establishing a foundation for future SAI research that will hopefully benefit SAI prevention and treatment.

Supporting information

S1 Table.

S1.1 Table. Cohort demographics for PheWAS of phecode occurrences. S1.2 Table. Cohort demographics for PheWAS of phecode age of first onset.

https://doi.org/10.1371/journal.pone.0303395.s005

(XLSX)

S2 Table. Statistical models and SAI risk factors used.

https://doi.org/10.1371/journal.pone.0303395.s006

(XLSX)

S3 Table.

S3.1 Table. PheWAS results for SAI occurrences Vs phecode occurrences. Basic. S3.2 Table. PheWAS results for SAI occurrences Vs phecode occurrences. After removing patient with known SAI risk factors (only sentinel phecode where screened).

https://doi.org/10.1371/journal.pone.0303395.s007

(XLSX)

S4 Table.

S4.1 Table. PheWAS results for SAI age of first onset Vs phecode age of first onset. Basic. S4.2 Table. PheWAS results for SAI age of first onset Vs phecode age of first onset. After removing patient with known SAI risk factors (only sentinel phecode where screened).

https://doi.org/10.1371/journal.pone.0303395.s008

(XLSX)

S6 Table. Classification of sentinel phecodes into disease categories.

https://doi.org/10.1371/journal.pone.0303395.s010

(XLSX)

Acknowledgments

Authors would like to thank Dr. Daniel Musher, MD, a Distinguished Service Professor of Medicine-Infectious Disease at Baylor College of Medicine, Houston, TX and Dr. Joseph John Jr, MD, a retired staff physician at VA Medical Center in Charleston, SC for critically reviewing the manuscript. We would also like to thank Dr. David Puthoff for editorial assistance.

References

  1. 1. Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene—disease associations. Bioinformatics. 2010; 26(9):1205–1210. pmid:20335276
  2. 2. Haupert SR, Shi X, Chen C, Fritsche LG, Mukherjee B. A Case-Crossover Phenome-wide association study (PheWAS) for understanding Post-COVID-19 diagnosis patterns. J Biomed Inform. 2022; 136:104237. pmid:36283580
  3. 3. Lowy F. Staphylococcus aureus Infections. N Engl J Med. 1998; 339(8):520–32. pmid:9709046
  4. 4. Tak T, Reed KD, Haselby RC, McCauley CS, Shukla SK. An update on the epidemiology, pathogenesis and management of infective endocarditis with emphasis on Staphylococcus aureus. WMJ. 2002; 101(7):24–33. pmid:12426917
  5. 5. Dreyfus JG, Yu H, Begier E, Gayle J, Olsen MA. Incidence and burden of Staphylococcus aureus infection after orthopedic surgeries. Infect Control Hosp Epidemiol. 2021/05/26 ed. Cambridge University Press; 2022; 43(1):64–71. pmid:34034839
  6. 6. Hindy J-R, Quintero-Martinez JA, Lee AT, et al. Incidence Trends and Epidemiology of Staphylococcus aureus Bacteremia: A Systematic Review of Population-Based Studies. Cureus. Cureus, Inc.; 2022; 14(5):e25460. pmid:35774691
  7. 7. Stewardson AJ, Allignol A, Beyersmann J, et al. The health and economic burden of bloodstream infections caused by antimicrobial-susceptible and non-susceptible Enterobacteriaceae and Staphylococcus aureus in European hospitals, 2010 and 2011: a multicentre retrospective cohort study. Eurosurveillance [Internet]. 2016; 21(33). Available from: https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2016.21.33.30319 pmid:27562950
  8. 8. Klein EY, Jiang W, Mojica N, et al. National Costs Associated With Methicillin-Susceptible and Methicillin-Resistant Staphylococcus aureus Hospitalizations in the United States, 2010–2014. Clin Infect Dis. 2019; 68(1):22–28. pmid:29762662
  9. 9. Jensen AG, Wachmann CH, Poulsen KB, et al. Risk Factors for Hospital-Acquired Staphylococcus aureus Bacteremia. Arch Intern Med. 1999; 159(13):1437–1444. pmid:10399895
  10. 10. Kutlu SS, Cevahir N, Akalin S, et al. Prevalence and risk factors for methicillin-resistant Staphylococcus aureus colonization in a diabetic outpatient population: A prospective cohort study. Am J Infect Control. Elsevier; 2012; 40(4):365–368.
  11. 11. Acheson LS, Siefried KJ, Clifford B, et al. One-third of people who inject drugs are at risk of incomplete treatment for Staphylococcus aureus bacteraemia: a retrospective medical record review. Int J Infect Dis. Elsevier; 2021; 112:63–65. pmid:34520844
  12. 12. Kluytmans J, van Belkum A, Verbrugh H. Nasal carriage of Staphylococcus aureus: epidemiology, underlying mechanisms, and associated risks. Clin Microbiol Rev. American Society for Microbiology; 1997; 10(3):505–520. pmid:9227864
  13. 13. Miller LS, Fowler VG Jr, Shukla SK, Rose WE, Proctor RA. Development of a vaccine against Staphylococcus aureus invasive infections: Evidence based on human immunity, genetics and bacterial evasion mechanisms. FEMS Microbiol Rev. 2020; 44(1):123–153. pmid:31841134
  14. 14. Denny JC, Devaney SA, Gebo KA. The “All of Us” Research Program. Reply. N Engl J Med. 2019; 381(19):1884–1885.
  15. 15. Wei W-Q, Bastarache LA, Carroll RJ, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLOS ONE. Public Library of Science; 2017; 12(7):e0175508. pmid:28686612
  16. 16. Wang Lijuan, Zhang Xiaomeng, Meng Xiangrui, et al. Methodology in phenome-wide association studies: a systematic review. J Med Genet. 2021; 58(11):720. pmid:34272311
  17. 17. Denny JC, Bastarache L, Ritchie MD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013; 31(12):1102–1111. pmid:24270849
  18. 18. Thurlow LR, Stephens AC, Hurley KE, Richardson AR. Lack of nutritional immunity in diabetic skin infections promotes Staphylococcus aureus virulence. Sci Adv. American Association for the Advancement of Science; 6(46):eabc5569. pmid:33188027
  19. 19. Fowler VG Jr, Olsen MK, Corey GR, et al. Clinical Identifiers of Complicated Staphylococcus aureus Bacteremia. Arch Intern Med. 2003; 163(17):2066–2072. pmid:14504120
  20. 20. Van den Bruel A, Thompson MJ, Haj-Hassan T, et al. Diagnostic value of laboratory tests in identifying serious infections in febrile children: systematic review. BMJ [Internet]. BMJ Publishing Group Ltd; 2011; 342. Available from: https://www.bmj.com/content/342/bmj.d3082 pmid:21653621
  21. 21. Martin B-J, Buth KJ, Arora RC, Baskett RJ. Delirium as a predictor of sepsis in post-coronary artery bypass grafting patients: a retrospective cohort study. Crit Care. 2010; 14(5):R171. pmid:20875113
  22. 22. Xie Y, Yang L. Calcium and Magnesium Ions Are Membrane-Active against Stationary-Phase Staphylococcus aureus with High Specificity. Sci Rep. 2016; 6(1):20628. pmid:26865182
  23. 23. Braga IA, Pirett CCNS, Ribas RM, Filho PPG, Filho AD. Bacterial colonization of pressure ulcers: assessment of risk for bloodstream infection and impact on patient outcomes. J Hosp Infect. Elsevier; 2013; 83(4):314–320. pmid:23313027
  24. 24. Fayolle M, Morsli M, Gelis A, et al. The Persistence of Staphylococcus aureus in Pressure Ulcers: A Colonising Role. Genes. 2021; 12(12). pmid:34946833
  25. 25. Dana AN, Bauman WA. Bacteriology of pressure ulcers in individuals with spinal cord injury: What we know and what we should know. J Spinal Cord Med. Taylor & Francis; 2015; 38(2):147–160. pmid:25130374
  26. 26. Hatlen TJ, Miller LG. Staphylococcal Skin and Soft Tissue Infections. Skin Soft Tissue Infect. 2021; 35(1):81–105. pmid:33303329
  27. 27. Woods C, Colice G. Methicillin-resistant Staphylococcus aureus pneumonia in adults. Expert Rev Respir Med. Taylor & Francis; 2014; 8(5):641–651.
  28. 28. Lim SY, Pannikath D, Nugent K. A retrospective study of septic arthritis in a tertiary hospital in West Texas with high rates of methicillin-resistant Staphylococcus aureus infection. Rheumatol Int. 2015; 35(7):1251–1256. pmid:25572838
  29. 29. Mehta Maitry S., Paule Suzanne M., Thomson Richard B., Kaul Karen L., Peterson Lance R. Identification of Staphylococcus Species Directly from Positive Blood Culture Broth by Use of Molecular and Conventional Methods. J Clin Microbiol. American Society for Microbiology; 2009; 47(4):1082–1086. pmid:19213701
  30. 30. Montinari MR, Minelli S, De Caterina R. The first 3500 years of aspirin history from its roots—A concise summary. Vascul Pharmacol. 2019; 113:1–8.
  31. 31. Loof TG, Mörgelin M, Johansson L, et al. Coagulation, an ancestral serine protease cascade, exerts a novel function in early immune defense. Blood. 2011; 118(9):2589–2598. pmid:21613262
  32. 32. Sedlacek M, Gemery JM, Cheung AL, Bayer AS, Remillard BD. Aspirin Treatment Is Associated With a Significantly Decreased Risk of Staphylococcus aureus Bacteremia in Hemodialysis Patients With Tunneled Catheters. Am J Kidney Dis. Elsevier; 2007; 49(3):401–408. pmid:17336701
  33. 33. Pishchany G, McCoy AL, Torres VJ, et al. Specificity for Human Hemoglobin Enhances Staphylococcus aureus Infection. Cell Host Microbe. Elsevier; 2010; 8(6):544–550. pmid:21147468
  34. 34. Weiss G, Goodnough LT. Anemia of Chronic Disease. N Engl J Med. Massachusetts Medical Society; 2005; 352(10):1011–1023. pmid:15758012
  35. 35. Musher DM, Alexandraki I, Graviss EA, et al. Bacteremic and Nonbacteremic Pneumococcal Pneumonia A Prospective Study. Medicine (Baltimore) [Internet]. 2000; 79(4). Available from: https://journals.lww.com/md-journal/Fulltext/2000/07000/Bacteremic_and_Nonbacteremic_Pneumococcal.2.aspx
  36. 36. Santosa CM, Megarani DV, Arifianto D, Salasia SIO. The mice’s hematological effect of given the Staphylococcus aureus and Persea americana. BIO Web Conf [Internet]. 2021; 33. Available from:
  37. 37. Nissenson AR, Dylan ML, Griffiths RI, et al. Clinical and Economic Outcomes of Staphylococcus aureus Septicemia in ESRD Patients Receiving Hemodialysis. Am J Kidney Dis. Elsevier; 2005; 46(2):301–308. pmid:16112049
  38. 38. Smit J, Adelborg K, Thomsen RW, Søgaard M, Schønheyder HC. Chronic heart failure and mortality in patients with community-acquired Staphylococcus aureus bacteremia: a population-based cohort study. BMC Infect Dis. 2016; 16(1):227. pmid:27225712
  39. 39. Arvanitaki A, Ibrahim W, Shore D, et al. Epidemiology and management of Staphylococcus Aureus infective endocarditis in adult patients with congenital heart disease: A single tertiary center experience. Int J Cardiol. Elsevier; 2022; 360:23–28. pmid:35500817
  40. 40. CHIRA S, MILLER LG. Staphylococcus aureus is the most common identified cause of cellulitis: a systematic review. Epidemiol Infect. 2009/08/03 ed. Cambridge University Press; 2010; 138(3):313–317. pmid:19646308
  41. 41. EELLS SJ, CHIRA S, DAVID CG, CRAFT N, MILLER LG. Non-suppurative cellulitis: risk factors and its association with Staphylococcus aureus colonization in an area of endemic community-associated methicillin-resistant S. aureus infections. Epidemiol Infect. 2010/06/21 ed. Cambridge University Press; 2011; 139(4):606–612.
  42. 42. Swartz MN. Cellulitis. N Engl J Med. Massachusetts Medical Society; 2004; 350(9):904–912.
  43. 43. Yamasaki O, Kaneko J, Morizane S, et al. The Association between Staphylococcus aureus Strains Carrying Panton-Valentine Leukocidin Genes and the Development of Deep-Seated Follicular Infection. Clin Infect Dis. 2005; 40(3):381–385. pmid:15668860
  44. 44. Lina G, Piémont Y, Godail-Gamot F, et al. Involvement of Panton-Valentine Leukocidin—Producing Staphylococcus aureus in Primary Skin Infections and Pneumonia. Clin Infect Dis. 1999; 29(5):1128–1132. pmid:10524952
  45. 45. Helena Masiuk, Katarzyna Kopron, Dorothee Grumann, et al. Association of Recurrent Furunculosis with Panton-Valentine Leukocidin and the Genetic Background of Staphylococcus aureus. J Clin Microbiol. American Society for Microbiology; 2010; 48(5):1527–1535.
  46. 46. Moore C, Davis NF, Burke JP, et al. Colonisation with methicillin-resistant Staphylococcus aureus prior to renal transplantation is associated with long-term renal allograft failure. Transpl Int. John Wiley & Sons, Ltd; 2014; 27(9):926–930. pmid:24853293
  47. 47. Nielsen LH, Jensen-Fangel S, Benfield T, et al. Risk and prognosis of Staphylococcus aureus bacteremia among individuals with and without end-stage renal disease: a Danish, population-based cohort study. BMC Infect Dis. 2015; 15(1):6. pmid:25566857
  48. 48. Draaijers L, Hassing R, Kooistra M, Van Kessel K, Hovens M. Severe Acquired Coagulopathy During Fulminant Staphylococcus aureus Sepsis Most Likely Caused by S. aureus Exotoxins (SSLs). Eur J Case Rep Intern Med. 2018; 5(12):0001002. pmid:30756003
  49. 49. Sorbie C. STAPHYLOCOCCAL SEPTICÆMIA. The Lancet. Elsevier; 1961; 278(7215):1284–1285.
  50. 50. Bader MS. Staphylococcus aureus Bacteremia in Older Adults: Predictors of 7-Day Mortality and Infection With a Methicillin-Resistant Strain. Infect Control Hosp Epidemiol. 2016/06/21 ed. Cambridge University Press; 2006; 27(11):1219–1225. pmid:17080380
  51. 51. Gofton TE, Young GB. Sepsis-associated encephalopathy. Nat Rev Neurol. 2012; 8(10):557–566. pmid:22986430
  52. 52. Lin W-T, Wu C-D, Cheng S-C, et al. High Prevalence of Methicillin-Resistant Staphylococcus aureus among Patients with Septic Arthritis Caused by Staphylococcus aureus. PLOS ONE. Public Library of Science; 2015; 10(5):e0127150. pmid:25996145
  53. 53. Rubinstein E, Kollef MH, Nathwani D. Pneumonia Caused by Methicillin-Resistant Staphylococcus aureus. Clin Infect Dis. 2008; 46(Supplement_5):S378–S385. pmid:18462093
  54. 54. Kavanagh Nicola, Ryan Emily J., Widaa Amro, et al. Staphylococcal Osteomyelitis: Disease Progression, Treatment Challenges, and Future Directions. Clin Microbiol Rev. American Society for Microbiology; 2018; 31(2): pmid:29444953
  55. 55. Miller LG, Quan C, Shay A, et al. A Prospective Investigation of Outcomes after Hospital Discharge for Endemic, Community-Acquired Methicillin-Resistant and -Susceptible Staphylococcus aureus Skin Infection. Clin Infect Dis. 2007; 44(4):483–492. pmid:17243049
  56. 56. Heal C, Harvey A, Brown S, Rowland AG, Roland D. The association between temperature, heart rate, and respiratory rate in children aged under 16 years attending urgent and emergency care settings. Eur J Emerg Med [Internet]. 2022; 29(6). Available from: https://journals.lww.com/euro-emergencymed/Fulltext/2022/12000/The_association_between_temperature,_heart_rate,.7.aspx pmid:35679531
  57. 57. Citla Sridhar D, Maher OM, Rodriguez NI. Pediatric Deep Venous Thrombosis Associated With Staphylococcal Infections: Single Institutional Experience. J Pediatr Hematol Oncol [Internet]. 2018; 40(2). Available from: https://journals.lww.com/jpho-online/Fulltext/2018/03000/Pediatric_Deep_Venous_Thrombosis_Associated_With.23.aspx pmid:29200147
  58. 58. Lu M, Fu M, Zhang Y, Shen T, Xie H, Liu D. Septicaemia with deep venous thrombosis and necrotising pneumonia caused by acute community-acquired methicillin-resistant Staphylococcus aureus in an infant with a three-year follow-up: a case report. BMC Infect Dis. 2022; 22(1):189. pmid:35209857
  59. 59. Jia Y, Wang W, Wang X, Jiao L, Wang Y. Staphylococcus aureus meningitis complicated with intracranial hemorrhage and cerebral infarction: a case report. Int J Neurosci. Taylor & Francis; 2022; 132(12):1221–1224. pmid:33491526
  60. 60. Dobrin RS, Day NK, Quie PG, et al. The role of complement, immunoglobulin and bacterial antigen in coagulase-negative staphylococcal shunt nephritis. Am J Med. Elsevier; 1975; 59(5):660–673. pmid:1106192
  61. 61. Koyama A, Kobayashi M, Yamaguchi N, et al. Glomerulonephritis associated with MRSA infection: A possible role of bacterial superantigen. Kidney Int. 1995; 47(1):207–216. pmid:7731148
  62. 62. Satoskar AA, Shapiro JP, Jones M, et al. Differentiating Staphylococcus infection-associated glomerulonephritis and primary IgA nephropathy: a mass spectrometry-based exploratory study. Sci Rep. 2020; 10(1):17179. pmid:33057112
  63. 63. Froberg MK, Palavecino E, Dykoski R, Gerding DN, Peterson LR, Johnson S. Staphylococcus aureus and Clostridium difficile Cause Distinct Pseudomembranous Intestinal Diseases. Clin Infect Dis. 2004; 39(5):747–750. pmid:15356793
  64. 64. Harriott Melphine M., Noverr Mairi C. Candida albicans and Staphylococcus aureus Form Polymicrobial Biofilms: Effects on Antimicrobial Resistance. Antimicrob Agents Chemother. American Society for Microbiology; 2009; 53(9):3914–3922.
  65. 65. Schlecht LM, Peters BM, Krom BP, et al. Systemic Staphylococcus aureus infection mediated by Candida albicans hyphal invasion of mucosal tissue. Microbiology. Microbiology Society; 2015. p. 168–181. pmid:25332378
  66. 66. Kosami K, Kenzaka T, Sagara Y, Minami K, Matsumura M. Clinically mild encephalitis/encephalopathy with a reversible splenial lesion caused by methicillin-sensitive Staphylococcus aureus bacteremia with toxic shock syndrome: a case report. BMC Infect Dis. 2016; 16(1):160. pmid:27091490
  67. 67. P Cohen Steven, Wang Eric J, Doshi Tina L, Vase Lene, A Cawcutt Kelly, Tontisirin Nuj. Chronic pain and infection: mechanisms, causes, conditions, treatments, and controversies. BMJ Med. 2022; 1(1):e000108. pmid:36936554
  68. 68. Mölkänen T, Ruotsalainen E, Rintala EM, Järvinen A. Predictive Value of C-Reactive Protein (CRP) in Identifying Fatal Outcome and Deep Infections in Staphylococcus aureus Bacteremia. PLOS ONE. Public Library of Science; 2016; 11(5):e0155644. pmid:27182730
  69. 69. Botheras CL, Bowe SJ, Cowan R, Athan E. C-reactive protein predicts complications in community-associated S. aureus bacteraemia: a cohort study. BMC Infect Dis. 2021; 21(1):312. pmid:33794783