Figures
Abstract
Background
Phenome-Wide Association study (PheWAS) is a powerful tool designed to systematically screen clinical observations derived from medical records (phenotypes) for association with a variable of interest. Despite their usefulness, no systematic screening of phenotypes associated with Staphylococcus aureus infections (SAIs) has been done leaving potential novel risk factors or complications undiscovered.
Method and cohorts
We tailored the PheWAS approach into a two-stage screening procedure to identify novel phenotypes correlating with SAIs. The first stage screened for co-occurrence of SAIs with other phenotypes within medical records. In the second stage, significant findings were examined for the correlations between their age of onset with that of SAIs. The PheWAS was implemented using the medical records of 754,401 patients from the Marshfield Clinic Health System. Any novel associations discovered were subsequently validated using datasets from TriNetX and All of Us, encompassing 109,884,571 and 118,538 patients respectively.
Results
Forty-one phenotypes met the significance criteria of a p-value < 3.64e-5 and odds ratios of > 5. Out of these, we classified 23 associations either as risk factors or as complications of SAIs. Three novel associations were discovered and classified either as a risk (long-term use of aspirin) or complications (iron deficiency anemia and anemia of chronic disease). All novel associations were replicated in the TriNetX cohort. In the All of Us cohort, anemia of chronic disease was replicated according to our significance criteria.
Conclusions
The PheWAS of SAIs expands our understanding of SAIs interacting phenotypes. Additionally, the novel two-stage PheWAS approach developed in this study can be applied to examine other disease-disease interactions of interest. Due to the possibility of bias inherent in observational data, the findings of this study require further investigation.
Citation: Allaire P, Elsayed NS, Berg RL, Rose W, Shukla SK (2024) Phenome-wide association study identifies new clinical phenotypes associated with Staphylococcus aureus infections. PLoS ONE 19(7): e0303395. https://doi.org/10.1371/journal.pone.0303395
Editor: Kunyan Zhang, University of Calgary, CANADA
Received: October 5, 2023; Accepted: April 23, 2024; Published: July 5, 2024
Copyright: © 2024 Allaire et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: Funding: Part of this work was funded by Weber Endowment to SKS, Ebenreiter Postdoctoral Fellowship in Precision Medicine Research to NE and the donor dollars to Marshfield Clinic Research Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Although the Electronic Health Record (EHR) became mainstream in the United States’ healthcare systems in the early 2000s, research using these databases had been somewhat limited in scope. In 2010, a seminal Phenome-Wide Association Study (PheWAS) found that EHR data (e.g., a phenotype) can be screened against a genetic variant to replicate known genomic associations [1]. This has helped understand genetic variant associated phenotypic pleiotropy. Using this PheWAS paradigm, the genetic variant can be replaced by any other variable of interest, for instance clinical phenotypes presented by Staphylococcus aureus infections (SAIs). Using large cohorts with rich longitudinal EHR data, this adaptation of the PheWAS was a powerful opportunity to identify the spectrum of clinical phenotypes associated with SAIs like what has been shown with COVID-19 [2].
Diseases such as bacteremia, endocarditis, and osteomyelitis resulting from S. aureus cause significant morbidity and mortality [3,4]. SAIs (referring to all S. aureus causative diseases) pose a major problem for both inpatient and outpatient settings [5]. For example, the incidence rate of S. aureus bacteremia is up to 65 cases/100,000 patients/year [6]. SAI’s consequences include high mortality, prolonged hospitalization, and excessive healthcare costs [7]. In a 2010–2014 study, both MRSA (methicillin-resistant S. aureus) and MSSA (methicillin-sensitive S. aureus) led to excessive hospitalization costs [8]. Risk factors for SAIs include prolonged hospitalization, surgical procedures, immunocompromised status, type 2 diabetes, and glucocorticoid treatment [9–12]. An effective management of SAIs cannot be accomplished without a full understanding of all known and yet-to-be-known disease risk factors and downstream disease complications [13]. Given the intricacies inherent in studying patients with SAIs and the sizeable amount data available in modern EHR system, larger and more comprehensive studies can be done, which increases the opportunity to find novel associations with SAI. This, in turn, may improve our understanding of the SAIs.
To explore the SAIs-phenome interaction spectrum, we implemented a two-step PheWAS to identify novel associations using EHR data from a multispecialty Marshfield Clinic Health System (MCHS) in Marshfield, WI. Among many associations found, we identified three new phenotypes associated with SAIs and these associations were reproduced in datasets from the All of Us, a National Institutes of Health Research database and TriNetX, a global health research network.
Materials and methods
Ethic statement—Human subject research
This study utilized data from three cohorts: MCHS, TriNetX, and All of Us Research Project (AoURP). The data from these cohorts was previously de-identified. The authors had no access to any type of data that can potentially identify participants, except for ICD code dates required to establish time correlations between two diseases. This manuscript neither discusses individual-level data nor gives exact group size numbers for those smaller than 20 individuals.
- MCHS: The research contained in this article was approved by the institutional review board of the Marshfield Clinic Research Institute, IRB # IRB-18-056 was granted to Sanjay K Shukla Ph.D. on December 12th, 2021 for study number SHU10614. Informed Consent was not required as determined by the MCHS IRB as all the data were analyzed anonymously.
- TriNetX: The utilization of data from the TriNetX platform was exempted from requiring ethical approval at the researcher level. This exemption is due to the thorough de-identification process the data undergoes, which has been certified as HIPAA-compliant through expert determination. Since both TriNetX and the All of Us data are de-identified, and the research did not involve any intervention or interaction with living individuals, it is classified as "Not Human Subject" research. For more details on the TriNetX de-identification process, refer to: https://trinetx.com/wp-content/uploads/2022/04/TriNetX-Empirical-Summary-by-Brad-Malin-2020_branded.pdf
- All of Us Research Program: Similarly, the use of data from the All of Us Research Program doesn’t require ethical approval at the researcher level, thanks to its comprehensive de-identification procedure. While All of Us has specific criteria, such as studying groups smaller than 20, which would necessitate an IRB approval, our study doesn’t fall within these parameters. More about the All of Us Research Program’s IRB-approved protocol can be accessed here: https://allofus.nih.gov/about/all-us-research-program-protocol
Study cohorts description
The first, discovery cohort consisted of EHR from 754,401 patients form MCHS. Our inclusion criteria included a minimum of 18 years of age and at least 5 years of EHR data defined by any ICD code entry on two separate days. We extracted the dataset for this cohort once on March 20th, 2022. The second cohort, TriNetX, consisted of 82 healthcare organizations for a combined EHR data from 109,884,571 patients. We accessed this data directly from the TriNetX platform on of May 15th, 2023 and used it to calculate odd ratios (see below). Note that due to limitation of TriNetX data access, the initial inclusion criteria for this cohort were only to have all patients be at least 18 years of age with no restriction with length of EHR. To investigate the correlation of age of onset between SAI and other phenotypes, a sub-TriNetX cohort that included only patients with SAI was downloaded by us and included their full, de-identified, EHR records. This allowed to restrict patient eligibility as performed above for the MCHS cohort. This dataset was obtained on March 28th, 2022. The third cohort was derived from All of Us [14] and consisted of 118,538 participants filtered from 413,457 initial participant entries (CDR version C2022Q4R9, May 2023). Of the 413,457 participants, 287,012 had some type of EHR data accessible, and 254,487 have EHR defined by any ICD code entry on two separate days. We restricted the parameters more tightly to participants with five or more years EHR data result. That gave us a cohort of 166,790 individuals, of whom 118,538 had genetically predicted ancestry; these we retained. The TriNetX research network and All of Us cohort were used as a replication cohort for any novel SAIs associations identified with the MCHS cohort. Demographics of the cohorts used in the first step PheWAS are presented in S1.1 Table in S1 Table and those for the second step PheWAS in S1.2 Table in S1 File.
Two steps PheWAS screen
The two steps PheWAS screening was conducted with the MCHS dataset, as outlined in Fig 1, to identify novel associations. The dual PheWAS approach utilizes different input parameters to evaluate associations between SAIs and a phecode (codes that rapidly define case/control status of hundreds of clinical conditions). Briefly, a disease in EHR is defined by ICD-9 and ICD-10 codes. These ICD codes are mapped to 1866 phecodes (extracted from phecode map version 1.2; https://phewascatalog.org/phecodes) [15] for use in the PheWAS. In the first PheWAS, we screened medical records for phecodes that coincided with SAIs phecode, 41.1. We chose Phecode 41.1 as it encompasses all forms of methicillin-sensitive or methicillin-resistant SAIs at any clinical site including blood. Phecode 41.1 also includes the history of the disease as stated elsewhere. Here, a logistic regression model was used where the response variable was SAIs, and the other phenotype was the predictor. The model included age at the last healthcare visit, sex, and ancestry (or race) as covariates. Note that race was used instead of ethnicity in TriNetX as the latter was not available. Basic cohort characteristics for this PheWAS are given in S1.1 Table in S1 Table. Given that SAIs are definitive diagnoses, we only required one occurrence of a phecode to designate a case. Those patients without a SAIs phecode record were labelled as controls. We considered association results only when at least 50 patients were coded for both SAIs and the test phecode [16]. Although we acknowledge the subjective nature of this minimum count, the goal here was to screen the sample size and remove imprecise estimates from consideration.
Flow chart of the PheWAS 1 and 2 screening approaches and steps used in PheWAS 1 and PheWAS 2 statistical model.
This screening left 1373 phecodes tested. All those phecodes which reached Bonferroni adjusted p-value threshold for multiple testing (p<3.64e-5, alpha = 0.05, 1373 tests) and with an odds ratio (OR) of ≥ 5 were carried into the second sequential PheWAS. The high OR selection, which included those over the 75% percentile, reflects our goal to focus only on the most impactful associations, although we understand that less significant but relevant association may be missed.
The second PheWAS tested the relation of age at first onset between SAIs and a phecode. To avoid statistical inflation caused by one-time visits that could lead to aberrant simultaneous coding of SAIs and any other phecode, we excluded any patient where the onsets of SAIs and the phecode were within 24 h (1 day). Age at first SAIs (defined as the first ICD code entry mapped to SAI) was assigned as the response variable and onset of the test phenotype (defined as the first ICD code entry mapped to this phenotype) as a predictor variable. Covariates for adjustment included age, sex, and race at the last visit. Basic cohort characteristics for this PheWAS are given in S1.2 Table in S1 File. In this second screen, we considered only association test results where at least 50 patients presented the phecode within 60 days of onset of SAIs. We retained phecodes reaching Bonferroni adjusted p-value threshold for multiple testing (p<5.05e-4, alpha = 0.05, 99 tests) for further classification either as risk factors or complications of a SAI, based on which condition appeared first.
Next, we removed patients with known risk factors, including: phecodes 197 (chemotherapy), 250 (all type of diabetes), 429.1 (heart transplant/surgery), 510.2 (lung transplant), 573.2 (liver replacement by transplant), 585.32 (end stage renal disease), 851 (complications of transplant and reattached limbs), and 860 (bone marrow or stem cell transplant). Once this removal was complete, we performed the second PheWAS. This order helped account for known risk factors. Statistical models are formally defined in S2 Table. All statistical analyses were performed in R, version 3.6.3 or higher and using routine basic packages.
Replication of novel associations with TriNetX and All of Us
In Table 3, we calculated odds ratios (ORs) using a contingency table because TriNetX web-based analysis platform limits access to all TriNetX EHR data simultaneously. Therefore, these ORs are not adjusted for age, sex, and race. To make for a fair comparison, we also recalculated MCHS ORs using a contingency table and data derived from TriNetX (which includes MCHS) to develop two comparable datasets. The PheWAS of age of first onset uses linear regression models and were performed as described above. Results for All of Us are as detailed as was for TriNetX.
Redundant phecode pruning
Since phecodes within a class are often biologically similar, reporting the whole class may provide redundant information. To limit redundancy, we selected sentinel phenotype for each class of phecodes as the one with the highest ORs. For example, phecode class 250 represents both types of diabetes whereas the biology of type I diabetes (T1D) is different from type II diabetes (T2D) and we made that distinction in sentinel phecode selection. The sub-class phecode 250.1 represents T1D and has six sub-phecodes (250.10 to 250.15). The sub-class phecode 250.2 represents T2D and has also 6 sub-phecodes (250.20 to 250.15). We considered only sub-phecodes with p-value that met the Bonferroni threshold, and within those codes, chose the one with the highest OR as the sentinel phecode. Using T1D as an example, phecodes 250.10 and 250.13 were significant but 250.13 had a higher OR than 250.10. Therefore, only 250.13 was reported and represented the sentinel phecode of that group. A similar approach was used for T2D. If there were no obvious underlying biological distinctions within a phecode class, which was true for most classes, all sub-codes were curated together.
Results
The MCHS cohort, which was utilized as the discovery cohort, consisted of 754,401 patients, 52.4% of whom were female. The average age of the cohort was 55.6 years (SD = 20.7), with an average EHR length of 20.5 (SD = 10.2) years. It is notable that a significant proportion of the MCHS patient population (61.1%) self-identified as white of European descent. However, the ethnicity of a considerable fraction (36.7%) of this EHR population remained unknown (S1.1 Table in S1 Table).
The two steps PheWAS screen identified 41 phecodes associated with SAIs
The two-step PheWAS process with a threshold for flagging associations is illustrated in Fig 1, with details of the individual PheWAS model described in S2 Table. All statistical models were adjusted for age and sex at the last diagnosis. The only exception to this was the manual calculation of ORs performed with TriNetX and the equivalent for MCHS and AoURP in Table 3 (see Methods). The first PheWAS identified 236 associations (S3.1 Table in S3 Table), and these were carried forward to the second PheWAS for age correlation. Due to our criteria, which required at least 50 patients to have their first phecode registered within 60 days of the first SAI recorded, and the removal of individuals coded within 24 hours of an SAI (to avoid potential data inflation from one-visit specialty treatments), only 99 out of the 236 phecodes were eligible for testing in the second PheWAS. After the second screening, 94 phecodes remained significant (S4.1 Table in S4 Table). However, many of these phecodes belonged to the same phecode class, rendering them biologically similar and redundant. To address this, we selected the top association within each phecode class based on its p-value significance and the highest OR (see Materials and methods and S5 Table). We referred to this top association as the ’sentinel phecode’. This selection process reduced the number of significant associations to 41. The summary statistics from the two-step PheWAS for these 41 phecodes are presented in Table 1.
Sentinel phecodes associated with SAIs enriches in circulatory system disease category
We classified phecodes into disease categories to give a general sense of their association with biological/physiological pathways [17], (https://phewascatalog.org/). Out of 17 disease categories, 12 phecodes were found to be associated with SAIs (S6 Table). An enrichment of circulatory system phecodes emerged (7/41, or 17% of total phecodes), indicating that endovascular physiology is strategically important for SAIs. This is further supported by the inclusion of five other phecodes categorized under the hematopoietic category, which interacts with the circulatory system through the blood transport of hematopoietic cells. Altogether, these observations suggest that a dysfunctional blood/circulatory system may either provide an opportunity for infection, or that SAIs can perturb the system.
Phecodes associated with SAIs are mainly risk factors
To further unravel how the 41 phecodes were associated with SAIs, we classified them as either a potential risk factor or a complication of SAIs. We counted the number of times the age of first onset of a phecode occurred before (therefore a risk) SAIs versus after (a complication). We counted occurrences (pre-SAIs and post-SAIs) for both lifetime and acute, which we defined as within 60 days of a SAIs onset (Table 2). Using this classification method, out of the 41 phecodes, 24 associations were classified as either risk factors or complications of SAIs, and 17 remained unclassified (referred to as cryptic). Not surprisingly, some clinical identifiers of SAIs appeared as risk factors and represent 17 out of the 23 as shown in Table 2. We carried out a search through both PubMed and Google to identify established or reported associations in the literature. Criteria for labeling an association as established were the presence of more than one confirmatory study or a large patient cohort or a finding with an odds ratio of >2. Criteria for previously reported but less than established associations were as: i) mentioned in a case study or, ii) results from in vitro experimentation, and/or iii) symptoms of established associations like shortness of breath for MRSA pneumonia (Table 2).
Ratio = counts of risk / counts of complication.
Histograms for two known risk factors compared with our three new associations are displayed in Fig 2. The histogram counts represent the difference between the age of onset of phecode X and SAIs in one-year bins. One new risk factor, long-term aspirin usage, was mainly coded prior to the onset of SAI, similar to known risk factors such as T2D and acute renal failure (Fig 2A/B/E, and Table 2). Additionally, one association that was classified as a complication was novel (iron deficiency anemia due to chronic blood loss) and showed consistent trends with both lifelong and acute categories (Fig 2D and Table 2). Finally, anemia of chronic disease is classified as cryptic (Fig 2C and Table 2) because the lifelong ratio appears as a risk while acute as a complication.
A) Acute renal failure (known association phecode 585.1). B) Type 2 diabetes (known association phecode 250.25). C) Plasma protein metabolism disorder (novel association, phecode 270.38). D) Aspirin usage (novel association, 457.3). E) Anemia of chronic disease (novel association, phecode 285.2).
Novel associations are replicated in TriNetX and All of Us
We verified the three novel associations in TriNetX and All of Us by calculating ORs using contingency tables. Inexplicably, ICD code for ‘long-term use of aspirin’ was not available in All of Us and as a result, we could not determine if the association between SAIs and long-term use of aspirin is replicable in All of Us or not. We additionally provided ORs corrected for possible confounding effects from known risk factors (chemotherapy, diabetes, organ transplants, surgeries, end-stage renal disease) (see S2 Table for phecode/ICD10 code usage). Correction was done by removing patients with these risk factors. We observed that ORs and statistical significance are generally maintained between MCHS, TriNetX, and All of Us, although they trended higher in TriNetX and lower in All of Us likely due to the sizes of the cohorts (Table 3). Interestingly, removing risk factors in the MCHS cohort had an impact on the anemia of chronic disease OR, wherein it decreased from 16.99 to 4.07 but remained highly significant. In contrast, OR for both anemia of chronic disease in the TriNetX and All of Us cohort remained constant (Table 3). Consistent with the replicated ORs, the age of first onset correlation also remained significant and mostly similar in effect size in TriNetX (Table 3). While in the MCHS cohort, anemia of chronic disease was found to be cryptic, even after correcting for at-risk patients, TriNetX showed this as a consistent complication of SAIs (Table 3 and Fig 3).
TriNetX results after removal of patient at risk for SAIs. A) Lifelong count distribution across medical record. B) Acute count distribution over a window of +/- 60 days centered on age of onset of SAIs. Risk phecode include all diabetes, acute renal failure, all transplant, and chemotherapy codes. X axis measures the difference between the ages of first onset of phecode X—ages of first onset of SAIs. X axis is in years for A) and in days for B).
Interestingly, the OR of the "long-term use of aspirin" observation in the MCHS increased to 19.29 from 5.25 after removing patients with known risk factors, whereas in TriNetX the OR decreased to 4.86 from 6.54. The observed cohort-specific trend appears to be due to differences in the frequency of known risk factors, which, according to the data, seem to be considerably higher in the MCHS cohort. Summary statistics for PheWAS of occurrence are shown in S3.2 Table in S2 File, and those of PheWAS of age correlation are shown in S4.2 Table in S3 File. Fig 4 summarizes our findings.
The left panel describes four risk factors categories with examples of two phecodes in each category: 1) Known Risk factors, 2) Clinical identifiers, 3) S. aureus manifestations, and 4) Novel Associations. The right panel describes two novel complications after SAIs.
Discussion
Today’s ease of access to the EHR systems of large health care organization and centralized EHR data set collection, such as TriNetX and All of Us, combined with sophisticated statistical screening methods, enable researchers to discover disease associations that were previously unachievable. In our study, we used a two-step PheWAS method to pinpoint and categorize health conditions not previously linked to SAIs. We found 41 unique health conditions associated with SAIs, grouped into 12 disease categories. The circulatory system had the largest representation with seven conditions, highlighting its significant role in the health complications caused by SAIs, regardless of the infection source. Three of the health conditions—diabetes, congestive heart failure, and acute renal failure—have already been associated with SAIs, validating our method [18,19]. To partially summarize our results, we classified the phenotypes associated with increased risk of SAIs into four main categories: i) previously identified known risk factors, ii) clinical identifiers, iii) novel associations, and iv) Staphylococcal manifestations. We define “clinical identifier” as a known symptom of SAI that is detected prior to confirmation of an SAI. The novel association category included: a) long-term use of aspirin, b) iron deficiency anemia, and c) anemia of chronic disease.
One advantage of our algorithm was in its ability to identify clinical identifiers as risk factors. For instance, previous studies have noted white blood cell count [20] or symptoms like delirium [21] after sepsis during patient evaluation. In our study, we recorded these indicators before the formal clinical confirmation of a SAI, indicating they can serve as warning signs for risk of infection. Additionally, our study supported the link between two previously suggested clinical identifiers and SAIs: disorders of magnesium metabolism (including both hypomagnesemia and hypermagnesemia) [22] and decubitus ulcers [23]. A deficiency of magnesium may reduce innate host defense against S. aureus, increasing the risk of infection. Notably, even after adjusting for diabetes or renal failure, magnesium imbalance remained a significant factor (see S3.2 Table in S2 File and S4.2 Table in S3 File). Xie and Yang (2016) highlighted the role of magnesium in fighting S. aureus infection, given its antimicrobial action against the bacterial membrane [22]. Decubitus ulcers, common in patients who are immobilized, can lead to various types of infections. It’s known that S. aureus can colonize these wounds [24], which may result in S. aureus bacteremia [23,25]. In line with this, we found cases where decubitus ulcer diagnosis was noted before an SAI incident. Identifying these kinds of associations is crucial, as it can contribute to the prevention and early diagnosis of SAIs, ultimately improving patient care.
Some conditions caused by S. aureus, referred to here as S. aureus manifestations, are typically identified around the time of SAIs diagnosis or shortly thereafter. These include diseases like cellulitis [26], MRSA pneumonia [27], carbuncle [26], and pyogenic arthritis [28] among others. There are also symptoms such as limb swelling, erythema, and edema that are coded before an SAI diagnosis. The seeming discrepancy may be due to the fact that identifying and confirming S. aureus involves a confirmation by lab-based culture, which can cause reporting delays [29].
In our study, long term use of aspirin was determined to be a risk factor for SAI. Known for its anticoagulant properties [30], aspirin use could heighten susceptibility to SAIs. Coagulation is a natural immune response to infection [31], so decreased coagulation might make an SAI more likely. It’s also possible that aspirin is prescribed to patients with abnormal coagulation. We suggest that the driver of SAIs is abnormal coagulation, rather than aspirin use per se, as these patients may already be on aspirin. Interestingly, aspirin use in hemodialysis patients has been reported to lower the risk of SAIs [32]. Alternatively, since SAI may also occur in patients who are chronically ill, aspirin usage may simply just be a marker of chronic illness specifically in patients with e.g., cardiac, vascular, or neurologic diseases as opposed to being directly related to SAI pathogenesis. In future focused studies, aspirin usage could be investigated by accounting for or excluding patients with confounding risk factors as mentioned above.
Our study verified two new associations, both of which were consistently observed in two additional cohorts, bringing the total to three cohorts confirming these novel complications. The first is iron deficiency anemia, which is physiologically linked to SAIs. S. aureus acquires iron from hemoglobin during invasive infections using its iron-regulated surface determinant receptor, IsdB [33]. This process aids in further invasion and persistence of S. aureus in the host, potentially leading to lower iron availability post-SAI. The second complication is anemia of chronic disease, previously reported as a risk factor for an SAI [9]. This condition may result from immune system changes affecting iron homeostasis due to bacterial, parasitic, fungal infections, or even cancer [34]. However, our findings suggest that anemia is more likely a complication than a risk factor. This is supported by Jensen et al., who reported the risk of anemia and hyponatremia following hospital-acquired S. aureus bacteremia [9]. Musher and colleagues also noted anemia preceding pneumococcal pneumonia, including severe cases with bacteremia [35]. Similarly, a mouse model showed S. aureus infection causing leukopenia, lymphopenia, neutrophilia, monocytosis, and microcytosis, the latter of which can lead to anemia [36]. Yet, to our knowledge, there are no existing reports specifying the type of anemia associated with SAIs. Importantly, for anemia of chronic disease, the long-term and acute risk/complication ratios remained low even after adjusting for known risk factors (lifetime 0.67, n = 2426; acute 0.69, n = 608). This suggests that anemia of chronic disease is likely a genuine complication of SAIs. Alternatively, as with long term usage of aspirin, anemia may just be a marker of chronic illness in patients who have previously coincidently been sick with SAI. Further studies will be needed to address this.
In our study, seventeen phecodes associated with SAIs were categorized as cryptic (Table 2) as they showed different lifetime and acute risk/complication ratios. For example, endocarditis is a disease that can result from SAIs in both community and hospital settings, making both associations plausible. For instance, patients might be hospitalized due to primary endocarditis [19] resulting from infections originating in the community (pre-SAI phecode). Alternatively, they might develop endocarditis following a S. aureus bacteremia acquired in the hospital (post-SAI phecode) [37].
Although our PheWAS approach has uncovered novel associations with SAIs, it is not without limitations. First and foremost, we intentionally designed our study to be strictly observational and not to determine cause and effect. As such, it should be regarded as only hypothesis generating. An important limitation of our study (and all studies based on ICD coding) is that there could be inconsistencies and variability in clinicians’ code selection. However, this limitation may be less relevant in our study due to the generalization of the phecode mapping system that clusters similar ICD-9 and ICD-10 codes to one phecode. Our phecode mapping system has its own distinct sets of limitations. One limitation inherent to the design of this study pertains to the simplification of the mapping ICD code to phecode mapping system. This is meant to simplify association testing, as close to 60,000 ICD-9 and ICD-10 codes exist. An immediate flaw of mapping several ICD codes is that it removes information from a treatment standpoint. However, this is not the focus of this study; rather, it is to identify risk factors/complications. Another weakness of the phecode mapping system that pertains to PheWAS of the age of onset is the inclusion of “history” codes. These codes provide an indication that a condition previously occurred with no mention of when or at what age. This could confound the statistics from the PheWAS of the age of first onset if the “history” ICD code occurs frequently. However, we find that, at least for SAI, the usage of the “history” code is infrequent, so it is not likely to significantly impact the results. Another weakness related the phecode system has to do with its categorization into disease categories that may not be fit from an infectious disease point of view. For example, cellulitis, abscess, and decubitus ulcer are all considered "dermatologic" but are different conditions. A similar mis-categorization along those lines is observed with osteomyelitis, MRSA pneumonia, and endocarditis, which are listed under musculoskeletal, respiratory, and circulatory systems, respectively, but not infectious. However, the key point in this disease category is that it reflects how organ systems are affected, and therefore, ones that are more prone to infections. One limitation of our study is the method we used to classify associations as either risk factors or complications. This classification should be considered within the context of the already understood pathophysiology of the disease. Sometimes a complication may appear as a risk factor due to the timing of phenotype or clinical identifier observation (e.g., cellulitis and increased white blood cell count) and coding in EHR. A fourth limitation concerns the generalization of our results to distinct ancestries. Most of the patients in the MCHS are of European ancestry and many have no reported ethnicity but are presumably European given known regional history. Our results are thus not generalizable to all ancestry but reflects that of white/European.
In conclusion, we have developed a unique PheWAS strategy to uncover a range of associations between various phenotypes and SAIs. Our study offers a comprehensive hypotheses catalogue of phenotypes associated with SAIs, establishing a foundation for future SAI research that will hopefully benefit SAI prevention and treatment.
Supporting information
S1 Table.
S1.1 Table. Cohort demographics for PheWAS of phecode occurrences. S1.2 Table. Cohort demographics for PheWAS of phecode age of first onset.
https://doi.org/10.1371/journal.pone.0303395.s005
(XLSX)
S2 Table. Statistical models and SAI risk factors used.
https://doi.org/10.1371/journal.pone.0303395.s006
(XLSX)
S3 Table.
S3.1 Table. PheWAS results for SAI occurrences Vs phecode occurrences. Basic. S3.2 Table. PheWAS results for SAI occurrences Vs phecode occurrences. After removing patient with known SAI risk factors (only sentinel phecode where screened).
https://doi.org/10.1371/journal.pone.0303395.s007
(XLSX)
S4 Table.
S4.1 Table. PheWAS results for SAI age of first onset Vs phecode age of first onset. Basic. S4.2 Table. PheWAS results for SAI age of first onset Vs phecode age of first onset. After removing patient with known SAI risk factors (only sentinel phecode where screened).
https://doi.org/10.1371/journal.pone.0303395.s008
(XLSX)
S6 Table. Classification of sentinel phecodes into disease categories.
https://doi.org/10.1371/journal.pone.0303395.s010
(XLSX)
Acknowledgments
Authors would like to thank Dr. Daniel Musher, MD, a Distinguished Service Professor of Medicine-Infectious Disease at Baylor College of Medicine, Houston, TX and Dr. Joseph John Jr, MD, a retired staff physician at VA Medical Center in Charleston, SC for critically reviewing the manuscript. We would also like to thank Dr. David Puthoff for editorial assistance.
References
- 1. Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene—disease associations. Bioinformatics. 2010; 26(9):1205–1210. pmid:20335276
- 2. Haupert SR, Shi X, Chen C, Fritsche LG, Mukherjee B. A Case-Crossover Phenome-wide association study (PheWAS) for understanding Post-COVID-19 diagnosis patterns. J Biomed Inform. 2022; 136:104237. pmid:36283580
- 3. Lowy F. Staphylococcus aureus Infections. N Engl J Med. 1998; 339(8):520–32. pmid:9709046
- 4. Tak T, Reed KD, Haselby RC, McCauley CS, Shukla SK. An update on the epidemiology, pathogenesis and management of infective endocarditis with emphasis on Staphylococcus aureus. WMJ. 2002; 101(7):24–33. pmid:12426917
- 5. Dreyfus JG, Yu H, Begier E, Gayle J, Olsen MA. Incidence and burden of Staphylococcus aureus infection after orthopedic surgeries. Infect Control Hosp Epidemiol. 2021/05/26 ed. Cambridge University Press; 2022; 43(1):64–71. pmid:34034839
- 6. Hindy J-R, Quintero-Martinez JA, Lee AT, et al. Incidence Trends and Epidemiology of Staphylococcus aureus Bacteremia: A Systematic Review of Population-Based Studies. Cureus. Cureus, Inc.; 2022; 14(5):e25460. pmid:35774691
- 7. Stewardson AJ, Allignol A, Beyersmann J, et al. The health and economic burden of bloodstream infections caused by antimicrobial-susceptible and non-susceptible Enterobacteriaceae and Staphylococcus aureus in European hospitals, 2010 and 2011: a multicentre retrospective cohort study. Eurosurveillance [Internet]. 2016; 21(33). Available from: https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2016.21.33.30319 pmid:27562950
- 8. Klein EY, Jiang W, Mojica N, et al. National Costs Associated With Methicillin-Susceptible and Methicillin-Resistant Staphylococcus aureus Hospitalizations in the United States, 2010–2014. Clin Infect Dis. 2019; 68(1):22–28. pmid:29762662
- 9. Jensen AG, Wachmann CH, Poulsen KB, et al. Risk Factors for Hospital-Acquired Staphylococcus aureus Bacteremia. Arch Intern Med. 1999; 159(13):1437–1444. pmid:10399895
- 10. Kutlu SS, Cevahir N, Akalin S, et al. Prevalence and risk factors for methicillin-resistant Staphylococcus aureus colonization in a diabetic outpatient population: A prospective cohort study. Am J Infect Control. Elsevier; 2012; 40(4):365–368.
- 11. Acheson LS, Siefried KJ, Clifford B, et al. One-third of people who inject drugs are at risk of incomplete treatment for Staphylococcus aureus bacteraemia: a retrospective medical record review. Int J Infect Dis. Elsevier; 2021; 112:63–65. pmid:34520844
- 12. Kluytmans J, van Belkum A, Verbrugh H. Nasal carriage of Staphylococcus aureus: epidemiology, underlying mechanisms, and associated risks. Clin Microbiol Rev. American Society for Microbiology; 1997; 10(3):505–520. pmid:9227864
- 13. Miller LS, Fowler VG Jr, Shukla SK, Rose WE, Proctor RA. Development of a vaccine against Staphylococcus aureus invasive infections: Evidence based on human immunity, genetics and bacterial evasion mechanisms. FEMS Microbiol Rev. 2020; 44(1):123–153. pmid:31841134
- 14. Denny JC, Devaney SA, Gebo KA. The “All of Us” Research Program. Reply. N Engl J Med. 2019; 381(19):1884–1885.
- 15. Wei W-Q, Bastarache LA, Carroll RJ, et al. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLOS ONE. Public Library of Science; 2017; 12(7):e0175508. pmid:28686612
- 16. Wang Lijuan, Zhang Xiaomeng, Meng Xiangrui, et al. Methodology in phenome-wide association studies: a systematic review. J Med Genet. 2021; 58(11):720. pmid:34272311
- 17. Denny JC, Bastarache L, Ritchie MD, et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013; 31(12):1102–1111. pmid:24270849
- 18. Thurlow LR, Stephens AC, Hurley KE, Richardson AR. Lack of nutritional immunity in diabetic skin infections promotes Staphylococcus aureus virulence. Sci Adv. American Association for the Advancement of Science; 6(46):eabc5569. pmid:33188027
- 19. Fowler VG Jr, Olsen MK, Corey GR, et al. Clinical Identifiers of Complicated Staphylococcus aureus Bacteremia. Arch Intern Med. 2003; 163(17):2066–2072. pmid:14504120
- 20. Van den Bruel A, Thompson MJ, Haj-Hassan T, et al. Diagnostic value of laboratory tests in identifying serious infections in febrile children: systematic review. BMJ [Internet]. BMJ Publishing Group Ltd; 2011; 342. Available from: https://www.bmj.com/content/342/bmj.d3082 pmid:21653621
- 21. Martin B-J, Buth KJ, Arora RC, Baskett RJ. Delirium as a predictor of sepsis in post-coronary artery bypass grafting patients: a retrospective cohort study. Crit Care. 2010; 14(5):R171. pmid:20875113
- 22. Xie Y, Yang L. Calcium and Magnesium Ions Are Membrane-Active against Stationary-Phase Staphylococcus aureus with High Specificity. Sci Rep. 2016; 6(1):20628. pmid:26865182
- 23. Braga IA, Pirett CCNS, Ribas RM, Filho PPG, Filho AD. Bacterial colonization of pressure ulcers: assessment of risk for bloodstream infection and impact on patient outcomes. J Hosp Infect. Elsevier; 2013; 83(4):314–320. pmid:23313027
- 24. Fayolle M, Morsli M, Gelis A, et al. The Persistence of Staphylococcus aureus in Pressure Ulcers: A Colonising Role. Genes. 2021; 12(12). pmid:34946833
- 25. Dana AN, Bauman WA. Bacteriology of pressure ulcers in individuals with spinal cord injury: What we know and what we should know. J Spinal Cord Med. Taylor & Francis; 2015; 38(2):147–160. pmid:25130374
- 26. Hatlen TJ, Miller LG. Staphylococcal Skin and Soft Tissue Infections. Skin Soft Tissue Infect. 2021; 35(1):81–105. pmid:33303329
- 27. Woods C, Colice G. Methicillin-resistant Staphylococcus aureus pneumonia in adults. Expert Rev Respir Med. Taylor & Francis; 2014; 8(5):641–651.
- 28. Lim SY, Pannikath D, Nugent K. A retrospective study of septic arthritis in a tertiary hospital in West Texas with high rates of methicillin-resistant Staphylococcus aureus infection. Rheumatol Int. 2015; 35(7):1251–1256. pmid:25572838
- 29. Mehta Maitry S., Paule Suzanne M., Thomson Richard B., Kaul Karen L., Peterson Lance R. Identification of Staphylococcus Species Directly from Positive Blood Culture Broth by Use of Molecular and Conventional Methods. J Clin Microbiol. American Society for Microbiology; 2009; 47(4):1082–1086. pmid:19213701
- 30. Montinari MR, Minelli S, De Caterina R. The first 3500 years of aspirin history from its roots—A concise summary. Vascul Pharmacol. 2019; 113:1–8.
- 31. Loof TG, Mörgelin M, Johansson L, et al. Coagulation, an ancestral serine protease cascade, exerts a novel function in early immune defense. Blood. 2011; 118(9):2589–2598. pmid:21613262
- 32. Sedlacek M, Gemery JM, Cheung AL, Bayer AS, Remillard BD. Aspirin Treatment Is Associated With a Significantly Decreased Risk of Staphylococcus aureus Bacteremia in Hemodialysis Patients With Tunneled Catheters. Am J Kidney Dis. Elsevier; 2007; 49(3):401–408. pmid:17336701
- 33. Pishchany G, McCoy AL, Torres VJ, et al. Specificity for Human Hemoglobin Enhances Staphylococcus aureus Infection. Cell Host Microbe. Elsevier; 2010; 8(6):544–550. pmid:21147468
- 34. Weiss G, Goodnough LT. Anemia of Chronic Disease. N Engl J Med. Massachusetts Medical Society; 2005; 352(10):1011–1023. pmid:15758012
- 35. Musher DM, Alexandraki I, Graviss EA, et al. Bacteremic and Nonbacteremic Pneumococcal Pneumonia A Prospective Study. Medicine (Baltimore) [Internet]. 2000; 79(4). Available from: https://journals.lww.com/md-journal/Fulltext/2000/07000/Bacteremic_and_Nonbacteremic_Pneumococcal.2.aspx
- 36. Santosa CM, Megarani DV, Arifianto D, Salasia SIO. The mice’s hematological effect of given the Staphylococcus aureus and Persea americana. BIO Web Conf [Internet]. 2021; 33. Available from:
- 37. Nissenson AR, Dylan ML, Griffiths RI, et al. Clinical and Economic Outcomes of Staphylococcus aureus Septicemia in ESRD Patients Receiving Hemodialysis. Am J Kidney Dis. Elsevier; 2005; 46(2):301–308. pmid:16112049
- 38. Smit J, Adelborg K, Thomsen RW, Søgaard M, Schønheyder HC. Chronic heart failure and mortality in patients with community-acquired Staphylococcus aureus bacteremia: a population-based cohort study. BMC Infect Dis. 2016; 16(1):227. pmid:27225712
- 39. Arvanitaki A, Ibrahim W, Shore D, et al. Epidemiology and management of Staphylococcus Aureus infective endocarditis in adult patients with congenital heart disease: A single tertiary center experience. Int J Cardiol. Elsevier; 2022; 360:23–28. pmid:35500817
- 40. CHIRA S, MILLER LG. Staphylococcus aureus is the most common identified cause of cellulitis: a systematic review. Epidemiol Infect. 2009/08/03 ed. Cambridge University Press; 2010; 138(3):313–317. pmid:19646308
- 41. EELLS SJ, CHIRA S, DAVID CG, CRAFT N, MILLER LG. Non-suppurative cellulitis: risk factors and its association with Staphylococcus aureus colonization in an area of endemic community-associated methicillin-resistant S. aureus infections. Epidemiol Infect. 2010/06/21 ed. Cambridge University Press; 2011; 139(4):606–612.
- 42. Swartz MN. Cellulitis. N Engl J Med. Massachusetts Medical Society; 2004; 350(9):904–912.
- 43. Yamasaki O, Kaneko J, Morizane S, et al. The Association between Staphylococcus aureus Strains Carrying Panton-Valentine Leukocidin Genes and the Development of Deep-Seated Follicular Infection. Clin Infect Dis. 2005; 40(3):381–385. pmid:15668860
- 44. Lina G, Piémont Y, Godail-Gamot F, et al. Involvement of Panton-Valentine Leukocidin—Producing Staphylococcus aureus in Primary Skin Infections and Pneumonia. Clin Infect Dis. 1999; 29(5):1128–1132. pmid:10524952
- 45. Helena Masiuk, Katarzyna Kopron, Dorothee Grumann, et al. Association of Recurrent Furunculosis with Panton-Valentine Leukocidin and the Genetic Background of Staphylococcus aureus. J Clin Microbiol. American Society for Microbiology; 2010; 48(5):1527–1535.
- 46. Moore C, Davis NF, Burke JP, et al. Colonisation with methicillin-resistant Staphylococcus aureus prior to renal transplantation is associated with long-term renal allograft failure. Transpl Int. John Wiley & Sons, Ltd; 2014; 27(9):926–930. pmid:24853293
- 47. Nielsen LH, Jensen-Fangel S, Benfield T, et al. Risk and prognosis of Staphylococcus aureus bacteremia among individuals with and without end-stage renal disease: a Danish, population-based cohort study. BMC Infect Dis. 2015; 15(1):6. pmid:25566857
- 48. Draaijers L, Hassing R, Kooistra M, Van Kessel K, Hovens M. Severe Acquired Coagulopathy During Fulminant Staphylococcus aureus Sepsis Most Likely Caused by S. aureus Exotoxins (SSLs). Eur J Case Rep Intern Med. 2018; 5(12):0001002. pmid:30756003
- 49. Sorbie C. STAPHYLOCOCCAL SEPTICÆMIA. The Lancet. Elsevier; 1961; 278(7215):1284–1285.
- 50. Bader MS. Staphylococcus aureus Bacteremia in Older Adults: Predictors of 7-Day Mortality and Infection With a Methicillin-Resistant Strain. Infect Control Hosp Epidemiol. 2016/06/21 ed. Cambridge University Press; 2006; 27(11):1219–1225. pmid:17080380
- 51. Gofton TE, Young GB. Sepsis-associated encephalopathy. Nat Rev Neurol. 2012; 8(10):557–566. pmid:22986430
- 52. Lin W-T, Wu C-D, Cheng S-C, et al. High Prevalence of Methicillin-Resistant Staphylococcus aureus among Patients with Septic Arthritis Caused by Staphylococcus aureus. PLOS ONE. Public Library of Science; 2015; 10(5):e0127150. pmid:25996145
- 53. Rubinstein E, Kollef MH, Nathwani D. Pneumonia Caused by Methicillin-Resistant Staphylococcus aureus. Clin Infect Dis. 2008; 46(Supplement_5):S378–S385. pmid:18462093
- 54. Kavanagh Nicola, Ryan Emily J., Widaa Amro, et al. Staphylococcal Osteomyelitis: Disease Progression, Treatment Challenges, and Future Directions. Clin Microbiol Rev. American Society for Microbiology; 2018; 31(2): pmid:29444953
- 55. Miller LG, Quan C, Shay A, et al. A Prospective Investigation of Outcomes after Hospital Discharge for Endemic, Community-Acquired Methicillin-Resistant and -Susceptible Staphylococcus aureus Skin Infection. Clin Infect Dis. 2007; 44(4):483–492. pmid:17243049
- 56. Heal C, Harvey A, Brown S, Rowland AG, Roland D. The association between temperature, heart rate, and respiratory rate in children aged under 16 years attending urgent and emergency care settings. Eur J Emerg Med [Internet]. 2022; 29(6). Available from: https://journals.lww.com/euro-emergencymed/Fulltext/2022/12000/The_association_between_temperature,_heart_rate,.7.aspx pmid:35679531
- 57. Citla Sridhar D, Maher OM, Rodriguez NI. Pediatric Deep Venous Thrombosis Associated With Staphylococcal Infections: Single Institutional Experience. J Pediatr Hematol Oncol [Internet]. 2018; 40(2). Available from: https://journals.lww.com/jpho-online/Fulltext/2018/03000/Pediatric_Deep_Venous_Thrombosis_Associated_With.23.aspx pmid:29200147
- 58. Lu M, Fu M, Zhang Y, Shen T, Xie H, Liu D. Septicaemia with deep venous thrombosis and necrotising pneumonia caused by acute community-acquired methicillin-resistant Staphylococcus aureus in an infant with a three-year follow-up: a case report. BMC Infect Dis. 2022; 22(1):189. pmid:35209857
- 59. Jia Y, Wang W, Wang X, Jiao L, Wang Y. Staphylococcus aureus meningitis complicated with intracranial hemorrhage and cerebral infarction: a case report. Int J Neurosci. Taylor & Francis; 2022; 132(12):1221–1224. pmid:33491526
- 60. Dobrin RS, Day NK, Quie PG, et al. The role of complement, immunoglobulin and bacterial antigen in coagulase-negative staphylococcal shunt nephritis. Am J Med. Elsevier; 1975; 59(5):660–673. pmid:1106192
- 61. Koyama A, Kobayashi M, Yamaguchi N, et al. Glomerulonephritis associated with MRSA infection: A possible role of bacterial superantigen. Kidney Int. 1995; 47(1):207–216. pmid:7731148
- 62. Satoskar AA, Shapiro JP, Jones M, et al. Differentiating Staphylococcus infection-associated glomerulonephritis and primary IgA nephropathy: a mass spectrometry-based exploratory study. Sci Rep. 2020; 10(1):17179. pmid:33057112
- 63. Froberg MK, Palavecino E, Dykoski R, Gerding DN, Peterson LR, Johnson S. Staphylococcus aureus and Clostridium difficile Cause Distinct Pseudomembranous Intestinal Diseases. Clin Infect Dis. 2004; 39(5):747–750. pmid:15356793
- 64. Harriott Melphine M., Noverr Mairi C. Candida albicans and Staphylococcus aureus Form Polymicrobial Biofilms: Effects on Antimicrobial Resistance. Antimicrob Agents Chemother. American Society for Microbiology; 2009; 53(9):3914–3922.
- 65. Schlecht LM, Peters BM, Krom BP, et al. Systemic Staphylococcus aureus infection mediated by Candida albicans hyphal invasion of mucosal tissue. Microbiology. Microbiology Society; 2015. p. 168–181. pmid:25332378
- 66. Kosami K, Kenzaka T, Sagara Y, Minami K, Matsumura M. Clinically mild encephalitis/encephalopathy with a reversible splenial lesion caused by methicillin-sensitive Staphylococcus aureus bacteremia with toxic shock syndrome: a case report. BMC Infect Dis. 2016; 16(1):160. pmid:27091490
- 67. P Cohen Steven, Wang Eric J, Doshi Tina L, Vase Lene, A Cawcutt Kelly, Tontisirin Nuj. Chronic pain and infection: mechanisms, causes, conditions, treatments, and controversies. BMJ Med. 2022; 1(1):e000108. pmid:36936554
- 68. Mölkänen T, Ruotsalainen E, Rintala EM, Järvinen A. Predictive Value of C-Reactive Protein (CRP) in Identifying Fatal Outcome and Deep Infections in Staphylococcus aureus Bacteremia. PLOS ONE. Public Library of Science; 2016; 11(5):e0155644. pmid:27182730
- 69. Botheras CL, Bowe SJ, Cowan R, Athan E. C-reactive protein predicts complications in community-associated S. aureus bacteraemia: a cohort study. BMC Infect Dis. 2021; 21(1):312. pmid:33794783