Genome-Wide Association Study of Staphylococcus aureus Carriage in a Community-Based Sample of Mexican-Americans in Starr County, Texas

Staphylococcus aureus is the number one cause of hospital-acquired infections. Understanding host pathogen interactions is paramount to the development of more effective treatment and prevention strategies. Therefore, whole exome sequence and chip-based genotype data were used to conduct rare variant and genome-wide association analyses in a Mexican-American cohort from Starr County, Texas to identify genes and variants associated with S. aureus nasal carriage. Unlike most studies of S. aureus that are based on hospitalized populations, this study used a representative community sample. Two nasal swabs were collected from participants (n = 858) 11–17 days apart between October 2009 and December 2013, screened for the presence of S. aureus, and then classified as either persistent, intermittent, or non-carriers. The chip-based and exome sequence-based single variant association analyses identified 1 genome-wide significant region (KAT2B) for intermittent and 11 regions suggestively associated with persistent or intermittent S. aureus carriage. We also report top findings from gene-based burden analyses of rare functional variation. Notably, we observed marked differences between signals associated with persistent and intermittent carriage. In single variant analyses of persistent carriage, 7 of 9 genes in suggestively associated regions and all 5 top gene-based findings are associated with cell growth or tight junction integrity or are structural constituents of the cytoskeleton, suggesting that variation in genes associated with persistent carriage impact cellular integrity and morphology.


Introduction
Infectious diseases result from complex interactions between the microorganism, the host, and the environment. Host genetic factors play a major role in determining differential susceptibility to major infectious diseases of humans, including malaria [1], HIV/AIDS [2], tuberculosis [3], hepatitis B [4], Norovirus diarrhea [5], prion disease [6], Cholera [7], and Helicobacter pylori infections [8]. The first evidence that genetic factors could impact infectious disease outcomes was derived from epidemiological studies that identified differences between human populations exposed to the same infectious organism [9]. This is equally true for S. aureus [10][11][12], but this pathogen represents a special case because it is an opportunistic pathogen that can colonize humans without causing overt disease [13]. It is therefore an ideal system for examining host pathogen interactions.
Even though humans are exposed to S. aureus at birth, not all are equally susceptible to colonization [9]. Many body sites can be colonized by S. aureus, but nasal decolonization has been shown to be effective in reducing colonization at other body sites, suggesting that the anterior nares is one of the primary S. aureus reservoirs [14,15]. Human carriage has been classified as either persistent, intermittent, or non-carriage with rates of carriage ranging from 10-35%, 20-75%, and 5-70%, respectively, depending on race, age, gender, and whether the population examined was hospital-or community-based [9,[16][17][18]. Carriage is not representative of infection, per se. Rather, carriage impacts the risk of acquiring infection, disease presentation, and disease severity [13]. Furthermore, the genotype of the colonizing S. aureus strain, the nature of the immune response elicited following exposure, and underlying host genetic factors may all play a role in susceptibility to colonization and/or infection [9,[19][20][21][22][23][24]. Like other complex conditions, susceptibility to infectious agents does not typically follow a simple Mendelian pattern of inheritance, largely due to the fact that human immune responses are controlled by complex genetic mechanisms and modifying environmental influences [25,26].
Candidate gene studies have uncovered associations between specific genes and carriage status [20][21][22][23][27][28][29]. For example, IL4 and C-reactive protein have been shown to be associated with carriage in the Rotterdam Study [20,22]. In the same study, a 68% reduction in risk of persistent carriage was observed related to the glucocorticoid receptor gene [30] (S1 Table). Polymorphisms in genes encoding different defensins and MBL (manose binding lectin) have also been associated with S. aureus persistent carriage [20, 31, 32] (S1 Table). The toll-like receptors have also been associated with increased risk of streptococci and enterococci skin and soft tissue infections [21,33] suggesting that there may be some commonalities in the genetics of susceptibility to infection with different pathogens. No community-based genomewide association or whole exome sequencing studies have previously been performed in the context of S. aureus carriage, but recently, 2 hospital-based genome-wide association studies of S. aureus infections were conducted [34,35]. That these studies failed to identify targets with genome wide significance is not necessarily surprising since hospital environments themselves are a significant risk factor for acquiring S. aureus infections and these effects may overwhelm modest genetic influences on risk [36].
The present study was designed to identify genes/markers associated with persistent and intermittent carriage of S. aureus in a community-based sample of 858 Mexican-Americans from Starr County, Texas. Single nucleotide polymorphism (SNP) data from the Affymetrix Genome-Wide SNPArray 6.0 assay imputed out to the complete SNP set in the 1000 Genomes Project [37] and whole exome sequence data were used to conduct single variant and genebased burden tests. The single variant analyses identified the KAT2B (lysine acetyltransferase 2B) region as significantly associated with intermittent S. aureus carriage. All 5 top genes identified in the gene-based burden test and at least 1 gene in each region suggestively associated with persistent carriage in the single variant analysis are associated in some fashion with maintenance of cellular integrity, the cytoskeleton, or the cell cycle. On the other hand, genes associated with intermittent carriage were largely associated with immune function, adipogeneisis, or inflammation. These analyses identified little evidence of overlap between genes or regions corresponding to different carriage phenotypes suggesting that each carrier state may be distinct.

Human subjects
This study and the consenting procedures were approved by the University of Texas Health Science Center Institutional Review Board (HSC-SPH-06-0225). Written informed consent was obtained from all participants before they were enrolled in the study.

Microbiologic testing
Specimens were collected from the nares using dry, unmoistened sterile BBL™ CultureSwabs™ Liquid Stuart swabs. Swabs were inserted into the patient's nostril approximately 1 inch from the edge from the anterior nares placing the swab in proximity with the inferior and middle concha and rolled several times. Bar-coded specimen tubes were stored and shipped at 4°C to the University of Texas Health Science Center at Houston School of Public Health for processing.
To identify and characterize S. aureus from specimens containing mixed flora, nasal swabs were inoculated on manitol salt agar (MSA) plates (Remel Inc., Lenexa, KS) as described [38]. Following inoculation of primary plates, swabs were broken off into tryptic soy broth for enrichment (TSB) (Remel Inc.). The enrichment broths were vortexed for 10 seconds to ensure that any bacteria still attached to the swab were released into the media and the samples subsequently incubated at 37°C for 48 hours and re-plated on secondary MSA plates. Gram staining of respective colonies that turned MSA plates yellow were used to ensure that selected colonies possessed S. aureus morphology. Presumptive S. aureus colonies were streaked on blood agar (BA) (Quad Five, Ryegate, MT) and TSB agar and incubated at 37°C for 24 h.
Following incubations on BA and TSB agar from the primary and secondary MSA plates, colonies were subjected to catalase (Sigma, St. Louis, MO) and coagulase testing (BactiStaph 1 Latex 450, Remel Inc.). Positive tests were considered diagnostic for S. aureus. The identification of S. aureus was also confirmed genetically by PCR amplification and sequencing of a fragment of the spa gene for 1598/1662 (96%) of the isolates as done previously [38]. The second MSA plates streaked from the overnight liquid broth cultures were examined for additional growth, and colonies with S. aureus morphology were isolated and tested as above. Once isolates were defined as S. aureus, their respective susceptibilities to methicillin were determined using the E-test 1 (AB Biodisk, Biomerieux, I'Etoile, France). Methicillin resistance was defined by growth at antibiotic concentrations 4 μg/ml. All confirmed S. aureus isolates were stored at -80°C [38].

Definition of the S. aureus carriage phenotypes
Carriage status was determined for individuals from whom nasal swabs were collected at two time points, 2 weeks apart (14±3 days) as described previously [39]. Carriers were defined by S. aureus positive cultures at either visit, and intermittent carriers were S. aureus positive at either the first or second visit but not both. Non carriers were negative for S. aureus at both visits [39].
Genome-wide association studies and generation of whole genome imputation data Subjects (n = 858) were eligible for this study because of prior participation in genome-wide association studies for diabetes [40]. Genotyping was performed at the Center for Inherited Disease Research using the Affymetrix Genome-wide SNPArray 6.0 assay with sample-and SNP-level genotyping quality control performed as described in Below et al. [40]. Imputations were carried out in the full Starr County sample, cleaned of ethnic outliers and including 1,616 unrelated (pairwise identity by descent 0.3) [41] individuals of which 858 met inclusion criteria in the present study. A set of autosomal scaffold SNPs were selected to drive imputation by excluding those with: 1) minor allele frequency <1%, 2) Hardy-Weinberg pvalues < 10 −4 in the full sample 3) missingness >10% in the full sample and 4) all ambiguous strand (AT/CG) SNPs. Individual-level missingness is <5% in all samples. 603,042 scaffold SNPs were carried forward into a two-step imputation strategy: i) pre-phasing using the program SHAPEIT [42] and ii) Imputation from the reference panel into the estimated haplotypes with IMPUTE v2 [43][44][45]. Imputations were done in conjunction with the T2D-GENES consortium as part of a larger set of some 13,000 multiethnic samples. SNPs with imputation quality 0.8 and minor allele frequency > 0.05 were carried forward for single variant analyses. Population stratification was evaluated using EIGENSOFT on a subset of directly genotyped SNPs pruned for local and long distance linkage disequilibrium as described in Patterson et al. [46].
Analyses were conducted by comparing persistent S. aureus carriers to noncarriers or intermittent carriers to noncarriers. Persistent carriers were defined as unrelated [41] individuals passing genotyping quality control and testing positive for colonization of S. aureus at both of two time points, 11 to 17 days apart (n = 141). Genes located within 50 kilobases of signals comprised of at least 4 SNPs and study-based minor allele frequency > 0.05 with a p value <10 −5 were considered suggestively significant. For each region showing association, we identified a sentinel marker, defined as the most significant SNP meeting all quality control thresholds (locus zoom plots, Figs A-L, in S1 File).
Associations of the imputed genetic markers with S. aureus carrier status were tested with the program SNPTEST v2 [44] using frequentist association tests, based on an additive model. To control for genotype uncertainty, we used the missing data likelihood score test (the score method). All association analyses corrected for ancestry using the first and second principal components from EIGENSOFT as covariates, and all analyses were run once including diabetes status as a covariate and once excluding diabetes status in the model.

Generation and analysis of whole exome single variants
Whole exome sequence data were available for a subset of 792 participants (131 persistent carriers, 88 intermittent carriers, and 573 non-carriers, as defined above). These were part of a larger group sequenced as part of the T2D-GENES Consortium at the Broad Institute using Agilent Truseq capture reagents on Illumina HiSeq2000 instruments.
Association tests of the 1,011 common (minor allele frequency > 0.05) single variants present in the exome sequence data were performed using logistic regression in the program PLINK v2 [47]. As above, association analyses were corrected for ancestry using the first and second principal components, and all analyses were run including and excluding diabetes status as a covariate in the model. These results were combined with the imputed data results in common Manhattan plots (Figs 1 and 2 and Figs M-N in S1 File).

Gene-based analysis of whole exome sequence data
We used the Variant Annotation Analysis and Search Tool (VAAST) to identify genes associated with increased risk of S. aureus colonization [48,49]. For quality-control purposes, we removed sites with missingness >10% in the full sample. We also used the rate option to set the maximum expected disease allele frequency to 0.05, as we expect to be powered to detect effects of common variants in single variant tests. The top two principle components from  EIGENSOFT were used as covariates in all VAAST analyses, and analyses were performed with and without diabetes status as a covariate, as above. Statistical significance was assessed using a covariate-adjusted randomization test as previously described [50,51]; p-value confidence intervals were calculated using a Poisson approximation based on the number of successes in the randomization test. Genome-wide significance thresholds for the gene-based tests were calculated from the number of genes tested (0.05/18665 = 2.68×10 −6 ).

Results
Population demographics and S. aureus carriage determination Carriage status was established by collecting and analyzing swabs for the presence of S. aureus on 2 occasions from a single nostril 11-17 days apart on 858 Mexican Americans from Starr County, TX, USA [39]. A summary of demographic information for these individuals, who were eligible due to prior participation in a genome-wide association study for type 2 diabetes, are presented in Table 1 [40]. Participants testing positive for S. aureus on 2 separate occasions were defined as persistent carriers (n = 141), participants testing positive once were defined as intermittent carriers (n = 97), and participants testing negative on both occasions were defined as non-carriers (n = 620) as previously described [38,39].

Single variant association tests
Single variant association tests of persistent S. aureus carriage identified 5 loci as suggestively significant (p value 10 −5 , as defined in the Methods) are summarized in Fig 1 and Table 2, namely MKLN1 (muskelin 1), SORBS1 (sorbin and SH3 domain containing 1), SLC1A2 (solute carrier family 1) SORBS1, a region intergenic between EPB41L4B (erythrocyte membrane protein band 4.1 like 4B) and PTPN3 (cytoskeletal-associated protein tyrosine phosphatase), and a region downstream of FGF3 (fibroblast growth factor 3). MKLN1 encodes an intracellular mediator of cell morphology and cytoskeletal responses [52,53]. SORBS1 is involved in insulin signaling and SLC1A2 is a member of the solute transporter family. EPB41L4B and PTPN3 are involved in membrane-cytoskeletal interactions while FGF3 is a member of the fibroblast growth factor family of genes. MKLN1 has been previously associated with childhood asthma [54], SORBS1 with suicide risk (46) and childhood obesity in Hispanics [55], SLC1A2 with fatty acid levels [56], essential tremor [57][58][59], and other traits [58,59], EPB41L4B with wound healing [60], PTPN3 with cancer [61], and FGF3 with breast cancer [62] and deafness [63,64]. Whole exome sequencing identified 1,011 common variants (minor allele frequency > 0.05). These are shown as x's in the Manhattan plots. In no case did any of these variants reach a suggestive level indicating that it is unlikely that there are common protein-coding variants of substantial effect. LocusZoom [65] plots for each top locus highlight LD (linkage disequilibrium) patterns among the top SNPs and show multiple SNPs in LD blocks being associated (Figs A-E in S1 File).
In addition, we carried out single variant association tests of intermittent carriage of S. aureus, defined as individuals testing positive for S. aureus colonization at either visit compared to non S. aureus carriers (Fig 2, Table 2). The 7 regions suggestively associated (as defined above) with intermittent carriage include a genome-wide significant finding on chromosome 3 at rs61440199 (p value 8.68 x 10 −9 ) that is intronic to KAT2B (lysine acetyltransferase 2B) (also known as PCAF; p300/CBP-associated factor), a gene associated with post-traumatic stress disorder [66], mean arterial blood pressure [67], adipogenesis [68], development of T regulatory cells [69], and recently shown to be a potential regulator of inflammatory responses following infection with S. aureus in a mouse model of disease (Table 2) [70]. Other signals were at or near UBE2E2 (ubiquitin-conjugating enzyme E2E 2), a gene that has been associated with risk to gestational and type 2 diabetes [71][72][73], ICK (intestinal cell [MAK-like] kinase), and ROBO1 (roundabout, axon guidance receptor, homolog 1), which encodes a member of the immunoglobulin gene superfamily and plays a role in axon guidance and neuronal precursor cell migration (Table 2). A SNP highly correlated with ROBO1 expression in the brain has been reproducibly associated with reading disabilities [74,75], and SNPs mapped to ROBO1 have been associated with levels of liver enzymes [76] and other pQTLs [77]. Three sentinel SNPs were intergenic between (RELT-like 1) and PGM2 (phosphoglucomutase 2), between genes LOC283585 and GALC, and between ZNF532 (zinc finger protein 532) and MALT1 (mucosa associated lymphoid tissue lymphoma translocation gene 1) ( Table 2). GALC encodes the enzyme β-galactocerebrosidase, mutations in which are responsible for Krabbe disease [78,79]. Homozygous mutations in MALT1 have been associated with immunodeficiency [80][81][82] ( Table 2). MALT1 has also been associated with multiple-sclerosis [83]. No common (minor allele frequency >0.05) variants in the whole exome sequencing data reached p value < 10 −5 (shown as x's in the Manhattan plot, Fig 2). LocusZoom plots for each top locus highlight LD patterns among top SNPs (Figs F-L in S1 File). It is notable that the signals for persistent carriage of S. aureus appear to be largely independent of signals for intermittent carriage of S. aureus. Of all top findings, only rs61440199 (KAT2B) and rs16993852 (RELL1) show nominal evidence of association in both persistent and intermittent carriage of S. aureus. Diabetes stratified and non-stratified analyses of both persistent and intermittent carriage gave highly concordant results across all analyses (Figs M-N in S1 File).

Gene-based tests of functional variants
The program VAAST [49,50] was used to identify genes enriched for functional rare variation in cases based on next generation whole exome sequence data. In the analysis of persistent carriers (131 cases,  Table 3 and gene functions are discussed below. As in the single variant analysis of 1000 Genomes imputed data and common variants from the exome sequence data, the gene-based findings in analyses of intermittent carriers of S. aureus appear to be largely independent between analyses of persistent versus intermittent carriage groups (Table 3). Only genes CCDC69 and ZNF280D reach nominal levels of significance (p value < 0.05) in both tests. CSF2RB shows suggestive enrichment of missense variation in both analyses, and may constitute a gene involved in general S. aureus carriage susceptibility ( Table 3). Shared signals should be interpreted with caution given that the non-carriage control group is the same in both tests, and thus the two tests are not strictly independent. As observed for the single variant association tests, diabetes stratified and non-stratified analyses gave highly concordant results across all analyses (Figs Q-R in S1 File). Manhattan and QQ plots suggest the type 1 error for both single variant and gene-based tests are well controlled (Figs M-R in S1 File). We used the Disease Association Protein-Protein Link Evaluator (DAPPLE) [84] to identify interactions between proteins encoded by the top 5 candidate genes in the persistent versus non-carrier and intermittent versus non-carrier VAAST runs. DAPPLE searches for proteinprotein interactions among a candidate gene list; a significant number of protein-protein interaction may indicate a shared molecular pathway relevant to S. aureus susceptibility. In the analysis of persistent carriers versus non-carriers we did not detect any direct protein-protein interactions. However, among the top 5 genes identified from the intermittent carriers versus non-carriers run, we found that TPO is directly interacting with CSF2RB (Fig S in S1 File). The p-value for observing at least one interaction among the top 5 genes is 0.008; the p-values for observing at least one interacting protein for TPO and CSF2RB are 0.015 and 0.014, respectively.

Replication of previously identified loci
We compared our persistent carriage single variant and gene-based test results to all sites previously reported in genetic analyses of S. aureus (S1 Table) [20-23, 27, 30-32, 34, 35,85]. When the variant in question was not present in our post-quality control imputed or exome sequenced variant lists, and therefore not analyzed in this study, we identified the best proxy variant by assessing linkage disequilibrium patterns in the Mexican-American (MEX) reference population within the 1000 Genomes Project data (release 27). In these cases, statistics for the variant with the highest linkage disequilibrium r 2 are provided in S1 Table. With the exception of CDK7 (discussed below) our findings do not replicate the genes and variants described in 2 previously conducted genome wide association studies, possibly because of several differences between these prior studies and the current study (described in the Discussion) [34,35]. We found suggestive evidence of association at rs4918120 (p value 0.034) a SNP previously identified by Nelson et al. [34] in Caucasian inpatients; however, we observed the opposite direction of effect of the T allele (odds ratio 0.70 versus 1.68, see S1 Table). Interrogation of our single variant test results for intermittent carriage at previously reported loci yielded replication at three loci identified by Ye et al. [35]: rs12696090 (p value 0.0214), rs7643377 (p value 0.0081), and rs9867210 (p value 0.0079), however as before; we find opposite direction of effect at each of these loci (S1 Table).
We also examined our gene-based test results for replication of previous findings at genes near previously associated SNPs and genes. CDK7 (cyclin-dependent kinase) (gene-based p value 0.040) replicates findings by Ye et al. [35] who studied genetic risk of hospital-based S. aureus infection in Caucasians and identified CDK7 using gene-based tests in the program VEGAS (S1 Table).

Discussion
This was the first genome-wide association study of S. aureus carriage states in a communitybased representative population. This approach is significantly different from previously described genome-wide association studies that were carried out in the context of S. aureus infections [34,35]. We found genome-wide significance at 1 gene region and 11 other regions meeting suggestive levels of significance for association with persistent and intermittent carriage states by single variant analysis. We also reported the 5 top findings from gene-based tests of persistent and intermittent carriage. The lack of overlap in signals between gene-based tests of rare functional variation and single-variant tests suggested that genome-wide association signals were not driven by coding sequence variation. Non-genic regulatory factors affecting gene expression levels or post-translational modifications may also affect carriage phenotypes.
We found that top signals associated with persistent and intermittent carriage captured genes of different cellular functions. Genome-wide single variant analysis identified 5 gene regions suggestively associated with persistent carriage. Gene-based rare variant analysis identified 5 genes in association with persistent carriage. Near genome-wide significance was observed only for FAM123C (p value < 6.50 x 10 −6 ). Each of these genes (except for TSGA10IP, which has not been previously described to our knowledge) was involved with cellular growth, tissue homeostasis, and/or cancer [86][87][88][89][90][91][92]. It should be noted however, that TSGA10IP (TSGA10 interacting protein) interacts with TSGA10, a protein also associated with cancer and that binds cytoskeletal proteins (e.g., vimentin and actin-γ1) [93,94].
In analyses of persistent S. aureus carriage, all of the top 5 findings from gene-based tests and all regions identified in the single variant analysis harbored at least 1 gene associated with either regulation of cell growth or maintenance of cellular integrity (e.g., tight junctions) [95,96]. Conversely, a minority of genes identified in previous genome-wide association studies of S. aureus infection were involved in cell cycle, cellular growth, or cellular integrity (S1 Table) [34, 35]. These differences are important for 2 reasons: i) carriage and infection are not mutually exclusive i.e., the S. aureus carriage status of individuals was not established in relation to the infections described in the previous genome-wide association studies, and ii) susceptibility to infections in hospital environments may not accurately reflect an individual's susceptibility to an infectious agent. Hospital environments in and of themselves place patients at increased risk for infections with numerous pathogens including S. aureus, an agent responsible for more healthcare-associated infections and surgical site infections than any other pathogen [97].
Genome-wide association analysis of intermittent carriage identified a different set of genes from those identified in association with persistent carriage. This analysis identified 7 gene regions. The top signal (rs61440199) was genome-wide significant (p value 8.68 x 10 −9 ) and intronic to KAT2B. This gene was of particular interest since its expression in mice was affected by the nature of the infecting S. aureus strain [70]. In addition, KAT2B has been linked to immune function, cancer progression, and adipogenesis [68,[98][99][100]. The association of KAT2B with cancer progression/cell cycle was also shared by SGOL, ROBO1, and ICK, and represents the only functional overlap with genes identified with persistent and intermittent carriage of S. aureus [101][102][103][104][105][106][107]. The other themes observed in the context of genes associated with intermittent carriage were genes associated with both adipogenesis and inflammation/immunity (KAT2B, ZNF532, RELL1, FOXO9, MALT1) [68,80,98,100,101,103,[108][109][110][111][112]. In light of sample ascertainment for diabetes in this cohort [40], a gene in 1 region, UBE2E2, was of interest because of prior associations with diabetes risk [71][72][73], however, stratification for diabetes provided highly concordant results with the unstratified analysis (data not shown). Our gene-based analyses did not model complications that present in diabetic patients (e.g., obesity, immune function, elevated blood glucose levels) that may alter susceptibility to intermittent carriage. The number of adipogenesis genes linked to intermittent carriage may be of significance in light of recent studies that identified a protective role for adipose tissue in a murine model of S. aureus skin infection, suggesting that immune factors produced by adipose tissues (e.g., antimicrobial peptides) may play a role in intermittent carriage [112].
Although gene-based analyses of rare functional variants failed to identify any genome-wide significant differences in association with intermittent carriage, a top signal, CSF2RB, demonstrated concordance of burden in both persistent and intermittent carriers (p value 7.04 x 10 −4 and p value < 4.15 x 10 −4 , respectively). CSF2RB codes for CD131, the common β receptor subunit for IL-3, IL-5, and GM-CSF (granulocyte/monocyte colony stimulating factor) that in mice was shown to play a role in regulating Th2 type immune responses [113]. In addition, CD131 stimulated the recruitment of neutrophils (which are a key innate immune component) and controlled the homeostasis of tissue dendritic cells [114,115]. In addition, DAPPLE analysis identified a significant protein-protein interaction between the CSF2RB and TPO gene products. TPO is critical to the production of thyroid hormones that can impact immune function and is also associated with mucinosis (myxedema), a disease characterized by increased glycosaminoglycan deposition in the skin [116,117]. Other than CSF2RB, no other top finding in the genebased tests were even modestly associated with both persistent and intermittent carriage.
Results from the 2 previously described genome-wide association studies identified a number of loci with statistical significance. However, those associations were for the most part not replicated in our studies or previous work [9, 18-23, 27, 30, 32, 34, 35, 85, 118]. Lack of replication between studies may be due to population differences, the impact of the respective colonizing/infecting S. aureus strains (and their relationship with distinct human genetic determinants), study design (i.e., S. aureus infection versus carriage), and the size of respective populations examined [9,22]. Replication of 1 gene identified by gene-based tests was observed in the context of persistent carriage that identified CDK7 (p value 0.041) from the VEGAS gene test conducted by Ye et al. (S1 Table). We also assessed gene-based evidence of replication in our analyses of intermittent carriage versus non-carriage and found no support for previously identified genes (data not shown).
Previous colonization studies have suggested that the 3 described staphylococcal carriage phenotypes (persistent, intermittent, and non-carriers) be modified to include only 2 carriage phenotypes: persistent carriers and intermittent/non-carriers [17]. However gene targets identified in the present S. aureus carriage genome-wide association study suggested that each phenotype is distinct. That the genome-wide association and rare variant analyses identified relatively little functional similarity between persistent and intermittent carriers may suggest underlying differences between these 2 carriage states. An alternate explanation is that these studies lacked sufficient power to identify common factors across the carriage states. Despite the recommendation of previous studies to consider intermittent and non-carriers as a single group, this reclassification would require ignoring the differences that exist between these 2 carriage states. It is clear, however, that persistent carriers represent the most distinct carriage state. This is supported by colonization studies that demonstrated that non-carriers (and decolonized intermittent carriers) artificially inoculated with S. aureus in the nares cleared the bacteria over a similar time period (4 days for non-carriers and 14 days for intermittent carriers) compared to persistent carriers (decolonized and then re-inoculated) that still harbored the S. aureus inoculum >154 days later [17]. Persistent carriers also had a different antibody profile against some staphylococcal virulence factors compared to the indistinguishable profile described for non-carriers and intermittent carriers [17]. In addition, persistent carriers that were decolonized and re-inoculated with a heterogeneous mix of S. aureus isolates were more likely to be re-colonized with their original colonizing isolate suggestive of an intimate association between the colonizing strain and the host [17].
This difference between persistent carriers and intermittent carriers (and intermittent carriers and non-carriers) is further accentuated by the function of the genes associated with the respective carriage states. Almost all determinants associated with persistent carriage were associated with cellular integrity, morphology, and growth, functions that directly hold the potential of impacting the host/pathogen interface that establishes environments permissive to persistent carriage.
Attachment to host surfaces is requisite for colonization and infection of host tissues by pathogens. S. aureus possesses an arsenal of adhesins capable of binding an array of host extracellular matrix (ECM) components. These components include fibrinogen, fibronectin, collagen, cytokeratin 10, elastin, heparan sulfate proteoglycans, von Willebrand factor, bone sialoprotein, vitronectin, and prothrombin that all facilitate the colonization of diverse tissues and accounts in part for the myriad of diseases than can result following infection with this pathogen [119][120][121]. It is not surprising therefore that host polymorphisms potentially affecting cellular integrity, morphology and growth could also impact colonization with different pathogens or strains of the same pathogen.
That various potential genes identified by the genome-wide association study (e.g., ALDH18A1, EPB41L4B, FGF3, and FGF4) and all but 1 gene identified in the rare variant analysis have been shown to possess tumorogenic potential should not be surprising since various genes shown to play roles in the progression of various cancers also play critical roles in wound healing, cellular migration, cellular integrity, and angiogenesis [60,122,123]. Polymorphisms in these gene products or any gene products with the potential of altering the structural integrity of the host cell could potentially impact staphylococcal colonization.
Focal adhesions represent large, multi-protein complexes that are closely associated with cell surface integrins that span the eukaryotic plasma membrane linking the cellular cytoskeleton to the ECM (surrounding the cell) [124]. Most integrins and their respective focal adhesions are expressed in the epidermis and regulate epithelial cell homeostasis by mediating cell adhesion processes (and signaling) critical to tissue repair following injury [124]. Of the gene targets identified in association with S. aureus persistent carriage, EHM2, PTPN3, SORS1, and MKLN1 can impact the integrity of focal adhesions that in turn alters the cytoskeleton [53,120,[124][125][126][127][128][129][130][131][132][133].
SORBS1 encodes CAP (Cbl-Associated Protein) [129,132,134] that affects insulin receptor signaling and also functions as a cytoskeletal regulatory protein [129]. In fibroblasts, when CAP associates with actin stress fibers, focal adhesion kinase binds CAP, and CAP over-expression induces the development of actin stress fibers and focal adhesions that physically link intracellular actin bundles to the extracellular substrates of many cell types [127,135]. Various pathogens like S. aureus usurp focal adhesions as a means of triggering their uptake by various non-professional antigen presenting cells, including epithelia/endothelial cells, osteoclasts, kidney cells, fibroblasts and keratinocytes [120]. S. aureus possess various fibronectin binding proteins (e.g., FnbpA, FnbpB, ClfA, ClfB) that facilitate coating the bacterial surface with this matrix molecule that in turn binds to α5β1 integrins resulting in the formation of a molecular bridge linking S. aureus to the host cell [125]. This interaction triggers the recruitment of focal adhesion proteins that further alter the cytoskeleton facilitating attachment, invasion, and the ability to persist in their hosts [125]. The importance of this interaction for the successful attachment/invasion of human cells by staphylococci was demonstrated by generating fnbpA/ fnbpB-deficient S. aureus that less effectively infected epithelial cells and in a mastitis model caused less severe disease [136,137]. Furthermore, cells unable to form focal adhesions were resistant to integrin α5β1-mediated cellular invasion by S. aureus [120,127].
EHM2 is a member of the 4.1R, ezrin, radixin, moesin (FERM) protein superfamily consisting of over 40 proteins that contain the characteristic 3-lobed FERM domain on the N-terminus that binds various cell membrane-associated proteins and lipids and the spectrin/actin binding domain (SABD) at the C-terminus [126]. The PTPN3 gene product also belongs to the FERM family and is a protein phospatase that is a structural constituent of the cytoskeletal shown to play a role in T cell activation, maintenance of tight junction integrity (between the cell membrane and the cytoskeleton) and both EHM2 and PTPN3 gene products are associated with focal adhesions [95,128,130,133,138]. EHM2 expression has been observed on wounds undergoing healing (primarily at the wound's leading edge) functioning as a positive regulator of keratinocyte adhesion and motility in addition to affecting the rates of cellular invasion and adhesion to collagen via regulation of matrix metaloprotease 9 (MMP9) i.e., EHM2 knockdown cells expressed significantly reduced levels of MMP9. This is of interest in the context of S. aureus since up-or down-regulation of MMP9 levels has been shown to affect disease progression resulting from S. aureus infections, that is, MMP9 levels that are either too high or too low can negatively affect wound healing and MMP9-deficient mice poorly controlled S. aureus infections [60,126,131,[139][140][141][142][143]. In addition, MMPs play critical roles in tissue remodeling (including the maintenance of the ECM), altering immune cell migration and infiltration patterns, and impacted inflammation by exerting effects on cytokines and chemokines [143,144]. As it relates to S. aureus colonization, a role for MMP9 has yet to be described; however, staphylococcal lipoteichoic acid has been shown to increase production of MMP9 in middle ear epithelial cells suggesting that increased MMP9 levels could be involved in progression of otitis media [141].
Unlike EHM2, PTPN3, and SORS1; the MKLN1 gene product muskelin mediates ECM binding via complex mechanisms involving interactions between different thrombospondin-1 (TSP-1) domains and various ligands (expressed by different cell types) including integrins, proteoglycans, or integrin-associated proteins. Alterations to muskelin expression levels altered attachment to TSP-1 in association with subtle changes to the organization of focal contacts [53]. Since TSP-1 has also been shown to serve as a ligand for S. aureus, polymorphisms in MKLN1 could alter staphylococcal binding or prevent clearance of S. aureus since TSP-1 breakdown products function as antimicrobial peptides (AMPs) that have broad antibacterial properties affecting both Gram-positive and -negative bacteria [145][146][147][148].
Homozygous mutations in ALDH18A1 (or other genes e.g., PYCR1, ATP6V0A2) can result in a heteogenous group of rare diseases characterized by loose or wrinkly skin known as cutis laxa [149][150][151]. Histologic analysis of skin from cutis laxa patients identified reduced elastin levels with less-well defined collagen fibers lacking the characteristic wavy morphology. In addition, collagen I and III levels were significantly reduced, and fibroblasts harvested from the dermis presented with reduced growth rates [149]. The majority of studies that have examined genes resulting in this rare condition have only described case reports of patients with homozygous mutations, making it difficult to interpret how polymorphisms with a less pronounced phenotype present at the cytoskeletal level.
Although adherence to host surfaces also represents a component of intermittent carriage (i.e., the organism has to attach to host tissues even if this association is transient), the intermittent periods of carriage, carriage of different strains over time, carriage of multiple strains, and the reduced S. aureus inocula recoverable from the nares of intermittent carriers suggests that different determinants are associated with this phenotype [17]. This is emphasized by the observation that the majority of gene targets associated with intermittent carriage were also associated with immune function/inflammation.
Our data suggested that determinants associated with persistent carriage and intermittent carriage differed. A limitation to the present study was the analysis of only two nasal swabs to establish carriage. Even though Nouwen et al. established that the 'two-culture' rule was 93.6% reliable [39] and numerous studies have used this approach to establish S. aureus carriage phenotypes [20, 38,118,[152][153][154][155][156] there exists room for classification error. Second, because only one nostril was sampled some participants may have been misclassified as intermittent or non carriers based on one study that described differences in S. aureus carriage between colonization [157]; however, two other studies did not identify any differences [158,159]. It should be noted that samples that were collected and analyzed for the present study were of the ciliated pseudostratified columnar epithelium associate the inferior and middle concha and not the nonkeratinized, squamous epithelium present in the anterior nares and used to establish S. aureus carriage by other studies. Furthermore, due to population differences and power we should be cautious in making assumptions with regard to specific genes associated with respective carriage states. We should therefore further dissect the observation that persistent carriage of S. aureus is affected primarily by polymorphisms at the host/pathogen interface and that intermittent carriage is more likely impacted by environmental factors combined with the heterogeneity of the host immune response. Manhattan (a) and QQ plots (b) of results of gene-based burden tests of rare functional variation in VAAST for intermittent S. aureus carriage versus non-carrier including diabetes, PC1, and PC2 as covariates. The x-axis represents the chromosome number and each dot represent one protein-coding gene. QQ plot shows the observed versus expected p-values for all protein-coding genes, grey shading represents 95% confidence interval, the red line indicates the null distribution of p-values. Fig S. Protein-protein interactions among top-5 candidate genes in the gene-based test of intermittent carriersversus non-carriers analysis. Red: genes that encode proteins with direct interactions to another top-5 candidate; blue: genes that encode proteins with second-degree interactions to another top-5 candidate; grey: genes that are not top-5 candidates, but encode proteins interacting with at least two top-5 candidates. The figure was generated using DAPPLE software. (DOCX) S1 Table. Previous genes and SNPs associated with S. aureus carriage or infection. Odyssey Program. This work was also partially supported from a grant from the Kleberg Foundation to E.L.B. Genotyping imputation and whole exome sequencing were performed as part of our involvement in the T2D-GENES Consortium and we acknowledge those efforts. We also express appreciation to the field staff in Starr County who contacted and collected the necessary participant data and samples. Lastly, we thank the participants for their generous and willing participation.