Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Identifying Host Genetic Risk Factors in the Context of Public Health Surveillance for Invasive Pneumococcal Disease

  • Jairam R. Lingappa ,

    Contributed equally to this work with: Jairam R. Lingappa, Logan Dumitrescu

    Affiliation Departments of Global Health, Medicine and Pediatrics, University of Washington, Seattle, Washington, United States of America

  • Logan Dumitrescu ,

    Contributed equally to this work with: Jairam R. Lingappa, Logan Dumitrescu

    Affiliation Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America

  • Shanta M. Zimmer,

    Affiliation Emory University, Atlanta, Georgia, United States of America

  • Ruth Lynfield,

    Affiliation Minnesota Department of Health, St. Paul, Minnesota, United States of America

  • Janet M. McNicholl,

    Affiliation Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America

  • Nancy E. Messonnier,

    Affiliation Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America

  • Cynthia G. Whitney,

    Affiliation Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America

  • Dana C. Crawford

    Affiliations Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America

Identifying Host Genetic Risk Factors in the Context of Public Health Surveillance for Invasive Pneumococcal Disease

  • Jairam R. Lingappa, 
  • Logan Dumitrescu, 
  • Shanta M. Zimmer, 
  • Ruth Lynfield, 
  • Janet M. McNicholl, 
  • Nancy E. Messonnier, 
  • Cynthia G. Whitney, 
  • Dana C. Crawford


Host genetic factors that modify risk of pneumococcal disease may help target future public health interventions to individuals at highest risk of disease. We linked data from population-based surveillance for invasive pneumococcal disease (IPD) with state-based newborn dried bloodspot repositories to identify biological samples from individuals who developed invasive pneumococcal disease. Genomic DNA was extracted from 366 case and 732 anonymous control samples. TagSNPs were selected in 34 candidate genes thought to be associated with host response to invasive pneumococcal disease, and a total of 326 variants were successfully genotyped. Among 543 European Americans (EA) (182 cases and 361 controls), and 166 African Americans (AA) (53 cases and 113 controls), common variants in surfactant protein D (SFTPD) are consistently underrepresented in IPD. SFTPD variants with the strongest association for IPD are intronic rs17886286 (allelic OR 0.45, 95% confidence interval (CI) [0.25, 0.82], with p = 0.007) in EA and 5′ flanking rs12219080 (allelic OR 0.32, 95%CI [0.13, 0.78], with p = 0.009) in AA. Variants in CD46 and IL1R1 are also associated with IPD in both EA and AA, but with effects in different directions; FAS, IL1B, IL4, IL10, IL12B, SFTPA1, SFTPB, and PTAFR variants are associated (p≤0.05) with IPD in EA or AA. We conclude that variants in SFTPD may protect against IPD in EA and AA and genetic variation in other host response pathways may also contribute to risk of IPD. While our associations are not corrected for multiple comparisons and therefore must be replicated in additional cohorts, this pilot study underscores the feasibility of integrating public health surveillance with existing, prospectively collected, newborn dried blood spot repositories to identify host genetic factors associated with infectious diseases.


Streptococcus pneumoniae (pneumococcus) is a Gram-positive, encapsulated bacterium and a leading cause of pneumonia, meningitis and bloodstream infection in children and the elderly. An estimated 62,000 cases and 6,000 deaths occur annually from invasive pneumococcal disease (IPD) in the US.[1] Globally, IPD causes over a million deaths in children under the age of 5. Asymptomatic nasopharyngeal colonization with S. pneumoniae is widespread, but overall few of those colonized develop IPD. In the U.S., race-dependent infection rates are higher in African Americans and American Indians than in persons of European descent.[2]

In 2000 a protein-polysaccharide conjugate vaccine (PCV7) was licensed in the U.S. targeting seven of the 90 known serotypes of S. pneumoniae; in 2010 a 13-valent vaccine was also licensed. Population-based surveillance for IPD through the Active Bacterial Core surveillance (ABCs) in the US documented an 81% decline in incidence of IPD in children less than two years of age after introduction of PCV7.[1], [3] Nonetheless, cases of IPD caused by serotypes not covered by the vaccine continue to occur. Identification of sensitive and specific risk factors for IPD may be helpful in implementation of public health prevention of IPD.

Risk of IPD has been associated with pathogen virulence, host susceptibility and epidemiologic factors. Population-based public health surveillance for IPD has been a critical tool for identifying epidemiologic risk factors for disease including cigarette smoking, recent viral respiratory infection, chronic medical conditions, immunosuppression and lower socioeconomic status.[4] However, many of these factors are not easily amenable to public health prevention interventions.

Specific clinical conditions (e.g., HIV-1 infection, and sickle cell disease) and variants in some host genes [5] are known to modify risk for IPD. Furthermore, molecular pathways have been identified that may be critical in host response to IPD.[5] Public health surveillance for IPD, such as through ABCs, captures data for population-based case cohorts, but without prospective collection of host genetic samples. Most states separately implement population-based dried bloodspot collection from newborns (nDBS) for screening for specific inherited traits. Recently, we reported preliminary results of a pilot study to integrate ABCs invasive bacterial disease data with a state-based nDBS repository to identify genetic samples from cases and controls.[6] Here we use this approach to evaluate candidate host genetic variants as risk factors for IPD.


tagSNPs associated with IPD

We performed single SNP allelic tests of association stratified by race/ethnicity in 543 European-Americans (EA), including 182 cases of IPD and 361 controls (Table 1) for 212 SNPs (Tables 2 and S1); and in 166 African-Americans (AA), including 53 cases and 113 controls for 287 SNPs. Because of the differences in sample size and its impact on power, we evaluated results based on both p-value and consistent direction of effect between the two ancestral groups. Overall, comparing IPD cases to controls, 17 tagSNPs in nine genes (CD46, SFTPA1, SFTPD, IL1B, ILIR1, IL4, IL10, IL12B, FAS) in EA and 11 tagSNPs in six genes (CD46, SFTPB, SFTPD, IL1B, ILIR1, and PTAFR) in AA have common variants that are associated at a liberal threshold of p≤0.05 (Tables 3 and S2, and Figure 1). Of these, three candidate genes, SFTPD, CD46 and IL1R1, have tagSNPs associated with IPD in both EA and AA; these associations are further described below.

Figure 1. Summary of tagSNP associations for European-Americans and African Americans.

Genomic and association characteristics for tagSNPs with p≤0.05 for IPD among either EA or AA are shown. Threshold lines for p = 0.05 (-log10 (0.05)  = 1.3) and OR = 1 are indicated in the relevant forest plots.

Table 3. Summary of tagSNPs associated with IPD in European-Americans and/or African-Americans.

Variants in SFTPD have the strongest and most consistent associations across EA and AA. SFTPD intronic variants rs17886286 and rs1998374 are underrepresented among EA IPD cases compared to controls (allelic OR 0.45 and 0.60, with p = 0.007 and 0.023, respectively). These variants are in moderate linkage disequilibrium (LD) with each other in EA (SeattleSNPs r2 = 0.732) and thus do not represent independent associations. Also, these associations do not follow an additive genetic model given the observation that the genetic effect estimates (odds ratios) for heterozygotes and homozygotes are similar for each of these variants, individually (Table S2). Among AA, two different SFTPD variants (5′ flanking rs12219080 and intronic rs17878441) are underrepresented in IPD cases compared to controls (allelic OR 0.32, and 0.36, with p = 0.009 and 0.020, respectively). These SNPs are not in high LD in AA (SeattleSNPs r2 = 0.352). The association between SFTPD rs17878441 and IPD is specific to AA given that this SNP is very rare among the EA cohort presented here (1 heterozygote case and 1 heterozygote control among 508 individuals tested, overall MAF = 0.002) and monomorphic in reference populations such as CEU HapMap and SeattleSNPs EA. SFTPD rs1998374 and rs12219080 have consistent directions of effect in EA and AA, although neither is associated in both groups at p≤0.05 (Table 3 and Figure 1).

To explore the joint effects of associated variants in SFTPD, we calculated a weighted Genetic Risk Score (GRS) in EA and AA for significant associations with SFTPD variants (rs1998374 and rs17886286 in EA, and rs12219080 and rs17878441 in AA). Together, the two EA SFTPD variants were associated with IPD at an OR = 0.47 (95%CI: 0.26, 0.84) and p-value = 0.011. Additionally, the two AA SFTPD variants were associated with IPD at an OR = 0.36 (95%CI: 0.17, 0.77) and p-value = 0.009. This must be viewed cautiously, as the method assumes each SNP to be independently associated IPD status; however, the SNPs used in the GRS are in moderate LD (r2 = 0.50 in EA, r2 = 0.31 in AA).

Variants in CD46 (three in EA and one in AA) are overrepresented in IPD compared to controls: among EA, intronic rs1962149, rs2488255, and rs2724385 have allelic ORs between 1.30 to 1.33, and p-values from 0.028 to 0.048 with moderate to strong LD (HapMap CEU r2 ranging from 0.684 to 1.00 in pair-wise comparisons). Intronic CD46 rs41317049 is overrepresented in IPD among AA compared to controls (allelic OR 2.42 with p = 0.042); rs1962149 has a consistent direction of effect in EA and AA, but with p>0.05 among AA.

Finally, seven variants in IL1R1 (three in EA and four in AA) are associated with IPD compared to controls at the p≤0.05 threshold. Among EA, intronic rs2160227, rs2287047, and rs3917318 are underrepresented among IPD cases compared to controls with very similar allelic ORs (0.72 to 0.73) and p-values (0.038–0.040). Among EA, LD is high for some (SeattleSNPs r2 = 1.00 for rs2160227 and rs3917318, and r2 = 0.80 for rs2287047 and rs2287049), but not all variants (e.g., SeattleSNPs r2 = 0.302 for rs3917318 and rs2287047). The direction of effect for IL1R1 variants associated in AA is not consistent: OR point estimates range from 0.57 to 8.75 with p-values of 0.020 to 0.042 (Table 3). Between EA and AA, the minor alleles for IL1R1 intronic SNPs rs3917318 and rs2160227 and 5′ flanking rs949963 are consistently underrepresented among cases compared with control regardless of race/ethnicity.

Among genes with variants associated with IPD in EA but not AA, rs1800894 near IL10 is most overrepresented in IPD with an allelic OR 2.89 and p = 0.004. Although not significant, the direction of effect is the same in AA for this IL10 SNP (OR 1.67, 95% CI [0.35, 7.35]). IL12B intronic variants rs919766 and rs2195940 are overrepresented in EA for IPD (allelic OR 1.85, p = 0.008 and 1.68, p = 0.05, respectively); among EA, these variants are in complete LD (SeattleSNPs r2 = 1.00). Variant rs2195940 maintains the same direction of effect with a similar magnitude in AA, though this association does not reach statistical significance. Furthermore, in a genotypic test of association, this variant is associated with a large odds ratio among homozygotes compared to heterozygotes (Table S2). Finally, SFTPA1 has two variants (intergenic rs4253457 and intronic rs1914663) with similar allelic OR in EA (0.52 and 0.53, respectively). Both are in complete LD in EA (SeattleSNPs; r2 = 1.00) but are in low LD in AA (SeattleSNPs r2 = 0.119). Of these, rs1914663 has a similar direction of effect among AA.

Among genes with variants associated with IPD in AA but not EA, rs905907 near PTAFR has the strongest association with an allelic OR of 2.38 and p = 0.005. The direction of effect in EA is opposite of that observed for AA for this association (OR  = 0.90; 95% CI: 0.454, 1.86).

Finally, for IL1B, variants rs2853550 and rs3917365 were overrepresented, but with low LD in EA (r2 = 0.352), while variants rs1143642 and rs2853550 were underrepresented and with low LD in AA (r2 = 0.141) but moderate LD in EA (r2 = 0.650). rs2853550 was the only tagSNP in our analysis that was associated (p≤0.05) in both AA and EA; but this SNP had opposite effects in each racial group.

tagSNPs associated with specific clinical outcomes

As expected, most tests of association conducted within specific clinical outcomes give similar results as tests of association performed in the larger IPD study. However, we did observe potentially important differences. Specifically for pneumococcal bacteremia 15 tagSNPs in eight genes (CD46, SFTPA1, SFTPD, ILIR1, IL4, IL12B, IL18, and MYD88) and seven tagSNPs in seven genes (MBL2, SFTPD, IL1B, ILIR1, IL4, and PTAFR) are associated with cases of pneumococal bacteremia compared to controls in EA and AA, respectively (Table S3). Thus, in addition to the genes identified as having variants in EA associated with IPD, the MYD88 variant rs7744 in the 3′ untranslated region is overrepresented in bacteremia cases among EA (allelic OR 1.49, p = 0.046) with a similar effect in AA (allelic OR 1.50), but which did not reach the p≤0.05 threshold. Also, the MBL2 variant rs930507 is overrepresented in bacteremia cases among AA (allelic OR 1.82, p = 0.046) with a similar effect in EA (allelic OR 1.19) but with p>0.3.

Additional tagSNPs, particularly in SFTPD, are associated with pneumonia and meningitis syndromes in EA (Table S4). However, the sample sizes associated with these outcomes are small (n = 17 cases of meningitis, and n = 25 cases of pneumonia) precluding meaningful interpretation of these associations. The AA sample size associated with these outcomes is also too small (n = 2 cases and n = 6 cases, respectively) to permit further analysis.


In our evaluation of a population-based cohort for host genetic variation associated with IPD in children, we identified 27 tagSNPs in 11 genes (CD46, SFTPA1, SFTPB, SFTPD, IL1B, IL1R1, IL4, IL10, IL12B, FAS and PTAFR) associated in EA or AA at a liberal significance threshold of p≤0.05. In particular, in EA and AA, variants in the surfactant protein D (SP-D) encoded by SFTPD (gene ID 6441) are consistently underrepresented in IPD and pneumococcal bacteremia cases compared to controls, suggesting that variants in this gene or those in linkage disequilibrium may confer protection from IPD. This is the first study linking SFTPD gene variation specifically with clinical IPD.

SP-D is a member of the collectin subgroup in the C-type lectin superfamily including surfactant protein A (SP-A) and mannose binding protein. SP-D and SP-A are found primarily in the respiratory tract and other mucosal surfaces and recent data suggests that they impact respiratory infections on multiple levels. Surfactant collectins broadly bind carbohydrates and lipids on the surface of bacteria and viruses, with specific binding of SP-D to S. pneumoniae reported.[7] SP-D deficient (sftpd−/− knockout) mice are associated with persistent pneumococcal colonization, decreased clearance of bacterial pathogens, and early onset and increased levels of S. pneumoniae bacteremia in colonized mice.[7] Overall, collectins exhibit both pro- and anti-inflammatory effects[8]: SP-D stimulates phagocytosis and scavenging of apoptotic cells with pro-inflammatory consequences.[9] Yet, SP-D and SP-A bind SIRPα [10], TLR2, and TLR4,[11] and CD14[12] through their globular carbohydrate recognition domain (CRD) to down-regulate inflammatory cytokines; [11], [13] and sftpd−/− knockout mice exhibit high levels of pulmonary inflammation.[14] These findings have led to speculation that collectins have dual roles: if the collectin collaginase tail is bound in the absence of a pathogen stimulus, an anti-inflammatory response results possibly mitigating damage from incidental environmental stimuli.[15] However, when pathogen signals are present, pulmonary collectins may provide pro-inflammatory stimuli for pathogen phagocytosis and NF-κB-mediated cytokine release.[10] Further study is needed to confirm our tagSNP associations and further dissect how protection from IPD by SFTPD variants reflects regulatory functions of SP-D.

Our analysis also identified variants in other innate immune and coagulation pathway genes (e.g., CRP and PTFAR) and inflammatory mediators (e.g., IL1R1, IL1B, IL12B and IL10) that may be associated with IPD. Since, as an exploratory study, we did not correct for multiple comparisons, definitive interpretation of these findings will require confirmation in larger cohort studies. Nevertheless, our findings support multiple pathways being involved in host response to IPD. Recent studies also suggest that additional genes in the toll-like receptor-signaling pathway (e.g., NFKB, IKB and MAL) may influence response to IPD. [16][18] Furthermore, the collectin MBL2 had variants overrepresented in pneumococcal bacteremia and meningitis, but not for overall IPD. This suggests the possibility of syndrome-specific host genetic associations, but our study was underpowered to definitively evaluate this.

For this analysis, we took an indirect association approach by selecting and genotyping SNPs that are either causative SNPs or in LD with the causative SNP. The latter situation most likely applies to the majority of SNPs found associated with IPD in this study, as 18 of the 27 (67%) associated SNPs are located in introns. Furthermore, of the four SFTPD variants associated with IPD in either EA and/or AA (including rs17886286 and rs1998374), all are intronic. Notably, in SeattleSNPs EA, rs17886286 and rs1998374 are in complete (r2 = 1.00) or high (r2 = 0.732) LD with rs3088308, a coding non-synonymous SNP, while in SeattleSNPs AA, these SNPs have little to no LD with rs3088308 (r2 = 0.002 for rs17886286 and r2 = 0.028 for rs1998374) and are not associated with IPD in AA. The two SFTPD variants that are associated with IPD in AA are in moderate LD with a different coding non-synonymous SNP, rs4469829, which is monomorphic in EA (r2 = 0.613 for rs17878441 and r2 = 0.620 for rs12219080). Furthermore, SFTPD variant rs721917, a non-synonymous SNP known to reduce serum levels of SP-D in EA, is in LD with rs1998374 in AA (SeattleSNPs r2 = 0.645) but not EA (SeattleSNPs r2 = 0.096). Thus, differences in LD patterns between ancestral populations may help to explain the disparate signals observed in EA compared to AA.

Our primary goal was to assess the feasibility of cross-linking surveillance data with an nDBS repository to perform tagSNP genomic studies, and toward this end we were highly successful: 82% of surveillance cases were linked to an nDBS, and 88% of samples successfully genotyped. Several key issues associated with this experience deserve emphasis. First, the completeness of IPD case surveillance in ABCs through use of active surveillance methods and routine audits of laboratory records combined with the overall low incidence of IPD in the general population support our assumption that controls were at low risk of having had IPD outside the surveillance time-period. Second, efficient linking of surveillance cases to nDBS samples was critical to minimize bias, but this linkage depends on consent requirements for nDBS use, which differ by state and continue to evolve.[19] Third, nearly a quarter of individuals identified as European-American through surveillance were found to have genetic characteristics indicating >10% African ancestry. Given differences in allele frequencies and LD between EA and AA populations, misclassification of ancestry can result in confounding. Although self-reported ancestry can be accurate in some settings,[20] our study underscores that more complete genetic evaluation for ancestry may be important to control for population stratification in some settings. Finally, the tagSNP approach we used is based on the assumption that most disease-causing variation will be captured through LD with common tagSNPs. Recent concern [21], [22] that low frequency host variation contributes substantial disease causation may underscore the need for large-scale gene sequencing and not a tagSNP analysis. Our earlier findings demonstrating that gene sequencing can be performed with high-fidelity using DNA from these nDBS samples [6] suggests that application of large-scale sequencing for initial variation discovery could be useful in future replication studies.

Our findings are not corrected for multiple comparisons and therefore should be viewed as preliminary with definitive proof requiring replication in additional cohorts. However, given our study results and the wealth of existing public health surveillance data and existing large repositories of nDBS, replication studies powered to detect associations with IPD and invasive disease caused by other vaccine preventable encapsulated bacteria (e.g., N. meningitidis and H. influenzae) should be feasible and could help further define host genetic risk factors for IPD and other infectious diseases and permit economic and attributable risk analyses to determine the usefulness of such risk factors for implementation of public health prevention interventions.

Materials and Methods

Ethics Statement

The protocol for this study was reviewed and approved by the institutional review boards at the Minnesota State Department of Health and the Centers for Disease Control and Prevention (CDC). Written informed consent was solicited from individuals providing nDBS or their guardians. The ethics approvals allowed for use of anonymous samples in cases where the individuals or their guardians provided written informed consent or where they could not be contacted; nDBS samples from individuals who declined consent were removed from this analysis. After determination of consent status, the database was made anonymous by delinking from surveillance identifiers, at which point CDC closed the project, allowing anonymous genotyping to proceed using the coded nDBS and study database. The University of Washington subsequently provided a certificate of exemption for this study.

Subject Ascertainment

As previously described [6], IPD cases in the state of Minnesota were identified through Active Bacterial Core surveillance (ABCs), coordinated through the Centers for Disease Control and Prevention's Emerging Infections Program. Cases of IPD were defined as individuals born between January 1, 1997 and December 31, 2000 who had isolation of S. pneumoniae from a normally sterile site during the same time period. nDBS collected from cases during the course of normal newborn screening were identified by cross-linking ABCs identifiers with the Minnesota newborn screening program. For each case nDBS identified, two anonymous control nDBS were selected based on surveillance and newborn screening data by matching case and control race/ethnicity, date of birth, and, when possible, hospital of birth. Parents or guardians of cases were contacted by mail for written consent. Surveillance data and nDBS were included for all cases with parental consent and those who did not respond after two mailings. ABCs data and case and control nDBS were stripped of linkage to personal identifiers.

Study cohort

Overall, 445 individuals were identified as meeting the IPD case definition. Of these cases, stored nDBS were identified for 366 (82%) with a median age of 1.1 years (range 0–3.2 years) and nearly equal numbers of males and females (Table 1). Anonymous nDBS were matched to cases by date of birth and identified race for 732 controls (82% of controls were also matched to cases by hospital of birth). Among these 1098 cases and controls (Surveillance Cohort), amplified DNA of adequate quality for genotyping was obtained from 967 (88%) representing 330 cases and 637 controls (Genotyped Cohort). Population substructure analysis indicated that 149 individuals (58 cases and 91 controls) in the Genotyped Cohort that had been specified in the surveillance data as having European American (EA) ancestry had genotypes consistent with ≥10% African ancestry. These cases were omitted from the final Analysis Cohort leaving 543 EA (182 cases and 361 controls), 166 AA (53 cases and 113 controls), and 109 from other ancestral backgrounds (Table 1). The 709 individuals (235 cases and 474 controls) of EA and AA race/ethnicity are referred to as the Analysis Cohort.

Candidate Gene and tagSNP Selection

We selected 34 candidate genes (Tables 2 and S1) based on publications prior to 2007 reporting an association of the gene or protein product with IPD, pneumonia or sepsis, or laboratory evidence for a role in host response to S. pneumoniae or other encapsulated bacteria (e.g., N. meningitidis). TagSNPs were selected for African-Americans and European-Americans using software LDSelect [23] (r2>0.64; minor allele frequency >5%) and data from SeattleSNPs [24], the Environmental Genome Project [25], Perlegen [26], and the International HapMap Project [27]. Amplified DNA samples were genotyped for 384 tagSNPs with a custom Illumina GoldenGate assay [28] by the Center for Inherited Disease Research (CIDR) as part of the National Heart Lung and Blood Institute (NHLBI) Re-sequencing and Genotyping (RSnG) service.

DNA Extraction and SNP Genotyping

De-identified genomic DNA extracted from 1,022 dried blood spots was whole genome amplified using a multiple displacement method (Molecular Staging, Inc. (MSI), New Haven, CT USA)[29] from a single 3-mm punch from ½ inch dried blood spots.

A total of 326 (85%) of 384 selected tagSNPs were successfully genotyped (Tables 2 and S1). Of 326 successfully genotyped, tagSNPs fell into 5 broad biological categories: 13 innate immune response genes with 126 SNPs (39%), 13 pro- and anti-inflammatory mediators with 139 SNPs (43%), three genes in coagulation pathways genes with 20 SNPs (6%), and five genes in activation/apoptosis pathways with 41 SNPs (12%) (Tables 2 and 1). Genes with 50% or more SNPs targeted but not successfully genotyped included: FCGR3A (8 of 9 SNPs failed genotyping), PROC (9 of 14 SNPs), TLR9 (4 of 7 SNPs) and FCGR2A (3 of 6 SNPs). For the IPD analysis, a total of 212 (65%) and 287(88%) genotyped SNPs had >95% genotyping efficiency, had a minor allele frequency (MAF) >0.01, and met Hardy-Weinberg Equilibrium (HWE) criteria of p>0.0001 among EA and AA, respectively. The majority of SNPs genotyped for CD40LG (10 of 14) were out of HWE (pEAcontrols<0.0001) and subsequently omitted from further analyses. Genotyping associated with the bacteremia case definition included: 120 cases and 361 controls were included for EA with 213 SNPs passing QC, and 39 cases and 113 controls among AA with 281 SNPs passing QC. We also report genotyping data from EA for the 25 cases of pneumonia and 17 cases of meningitis compared to 361 controls, with 211 SNPs passing QC for each of those analyses.

Study Population Demographics and Ancestry

Population stratification was evaluated using STRUCTURE (version 2.2, prior to performing SNP quality control or testing for association.[30], [31] Discrepancies were noted between genotype inferred ancestry and surveillance database assigned ancestry using 274 SNPs with a minor allele frequency >1%. Samples labeled as European-American that were inferred to have >10% African-American ancestry were omitted from further analysis.

Statistical Analysis

Only SNPs that passed quality control measures (minor allele frequency >1%, Hardy-Weinberg Equilibrium (HWE) p>0.0001 in controls, and genotyping efficiency >95%) were considered for analysis. All analyses were stratified by race/ethnicity, and tests of association were performed using PLINK [32] or STATA (version 10). Due to the small number of non-European, non-African samples, we did not analyze data from those samples. Exclusion of individual case or control samples due to genotyping quality control criteria and genetic ancestry disrupted the original case-control matching. In order to maximize the sample size for this exploratory analysis, tagSNP associations were assessed through an unmatched comparison of IPD cases to controls. Two-tailed χ2 tests were used with p≤0.05 considered significant without correction for multiple comparisons. Both allelic and genotypic tests of associations were performed for all SNPs. LD estimates were obtained through SeattleSNPs Genome Variation Server at data from the International HapMap Project [27] and SeattleSNPs [24] as indicated.

A weighted Genetic Risk Score (GRS) was calculated for SFTPD by PLINK for every participant, stratified by race/ethnicity, using SNPs that were associated with IPD status at p<0.05. The GRS is simply a sum across SNPs of the number of risk alleles (0, 1, or 2) at that SNP, multiplied by the odds ratio of the association. Participants with incomplete genotype data at any SNP used in the GRS were excluded from analysis. Logistic regression, with continuous GRS as the independent variable, was used to evaluate the joint effects of associated genetic variants with IPD status.

The primary analysis was based on ABCs data for IPD case status. We also evaluated other clinical outcomes of pneumococcal bacteremia, pneumonia, or meningitis, defined by S. pneumoniae cultured from blood only with no other site-specific diagnosis, isolation from pleural fluid or a clinician diagnosis of pneumonia, and isolation from cerebrospinal fluid or a clinician diagnosis of meningitis, respectively. Because of the small numbers of pneumonia and meningitis cases, we used Fisher's exact test for these analyses. Association results were also plotted graphically using Synthesis-View.[33]

Supporting Information

Table S1.



Table S2.

tagSNPs in candidate genes found associated with IPD. For European-Americans (182 Cases and 361 Controls), and African-Americans (53 Cases and 113 Controls). Allelic (2×2) and genotypic (2×3) models are used to calculate allelic, heterozygote and homozygote OR, 95% confidence intervals (CI) and minor allele frequency (MAF). Variants are ordered by decreasing significance of the allelic p-value.


Table S3.

tagSNPs in candidate genes found associated with bacteremia. For European-Americans (120 Cases and 361 Controls) with 213 SNPs passing QC (HWE >0.0001, MAF >0.01, Genotyping efficiency >90%) and African-Americans (39 Cases and 113 Controls) with 281 SNPs passing QC. Allelic (2×2) and genotypic (2×3) models are used to calculate allelic, heterozygote and homozygote OR, 95% confidence intervals (CI) and minor allele frequency (MAF). Variants are ordered by decreasing significance of the allelic p-value.


Table S4.

tagSNPs in candidate genes associated with pneumonia and meningitis for European-Americans (EA). 25 Cases and 361 Controls with 211 SNPs passing QC (HWE >0.0001, MAF >0.01, Genotyping efficiency >90%). Allelic (2×2) and genotypic (2×3) models are used to calculate allelic, heterozygote and homozygote OR, 95% confidence intervals (CI) and minor allele frequency (MAF). Variants are ordered by decreasing significance of the allelic p-value.



We thank Mr. Craig Morin (Minnesota State Health Department) for help with this project and Dr. Tom Taylor (CDC) for additional statistical analyses for this project that were not included in this manuscript. Dr. David Stephens (Emory University) provided advice and Dr. Robin Hampton in Dr. Stephens' lab provided laboratory support for handling DBS samples. Vanderbilt University Center for Human Genetics Research, Computational Genomics Core provided computational and/or analytical support for this work.

Author Contributions

Conceived and designed the experiments: JRL LD RL JM NEM DCC. Performed the experiments: JRL LD SMZ RL JM NEM CGW DCC. Analyzed the data: JRL LD DCC. Contributed reagents/materials/analysis tools: LD DCC. Wrote the paper: JRL LD SMZ RL JM NEM CGW DCC.


  1. 1. Whitney CG, Farley MM, Hadler J, Harrison LH, Bennett NM, et al. (2003) Decline in invasive pneumococcal disease after the introduction of protein-polysaccharide conjugate vaccine. N Engl J Med 348: 1737–1746.
  2. 2. Robinson KA, Baughman W, Rothrock G, Barrett NL, Pass M, et al. (2001) Epidemiology of invasive Streptococcus pneumoniae infections in the United States, 1995-1998: Opportunities for prevention in the conjugate vaccine era. JAMA 285: 1729–1735.
  3. 3. Centers for Disease Control and Prevention (2005) Direct and indirect effects of routine vaccination of children with 7-valent pneumococcal conjugate vaccine on incidence of invasive pneumococcal disease–United States, 1998-2003. MMWR Morb Mortal Wkly Rep 54: 893–897.
  4. 4. Nuorti JP, Butler JC, Farley MM, Harrison LH, McGeer A, et al. (2000) Cigarette smoking and invasive pneumococcal disease. Active Bacterial Core Surveillance Team. N Engl J Med 342: 681–689.
  5. 5. Waterer GW, Wunderink RG (2005) Genetic susceptibility to pneumonia. Clin Chest Med 26: 29–38.
  6. 6. Crawford DC, Zimmer SM, Morin CA, Messonnier NE, Lynfield R, et al. (2008) Integrating host genomics with surveillance for invasive bacterial diseases. Emerg Infect Dis 14: 1138–1140. PMCID: PMC2600343.
  7. 7. Jounblat R, Kadioglu A, Iannelli F, Pozzi G, Eggleton P, et al. (2004) Binding and agglutination of Streptococcus pneumoniae by human surfactant protein D (SP-D) vary between strains, but SP-D fails to enhance killing by neutrophils. Infect Immun 72: 709–716.
  8. 8. Forbes LR, Haczku A (2010) SP-D and regulation of the pulmonary innate immune system in allergic airway changes. Clin Exp Allergy 40: 547–562.
  9. 9. Vandivier RW, Ogden CA, Fadok VA, Hoffmann PR, Brown KK, et al. (2002) Role of surfactant proteins A, D, and C1q in the clearance of apoptotic cells in vivo and in vitro: calreticulin and CD91 as a common collectin receptor complex. J Immunol 169: 3978–3986.
  10. 10. Gardai SJ, Xiao YQ, Dickinson M, Nick JA, Voelker DR, et al. (2003) By binding SIRPalpha or calreticulin/CD91, lung collectins act as dual function surveillance molecules to suppress or enhance inflammation. Cell 115: 13–23.
  11. 11. Ohya M, Nishitani C, Sano H, Yamada C, Mitsuzawa H, et al. (2006) Human pulmonary surfactant protein D binds the extracellular domains of Toll-like receptors 2 and 4 through the carbohydrate recognition domain by a mechanism different from its binding to phosphatidylinositol and lipopolysaccharide. Biochemistry 45: 8657–8664.
  12. 12. Sano H, Chiba H, Iwaki D, Sohma H, Voelker DR, et al. (2000) Surfactant proteins A and D bind CD14 by different mechanisms. J Biol Chem 275: 22442–22451.
  13. 13. Murakami S, Iwaki D, Mitsuzawa H, Sano H, Takahashi H, et al. (2002) Surfactant protein A inhibits peptidoglycan-induced tumor necrosis factor-alpha secretion in U937 cells and alveolar macrophages by direct interaction with toll-like receptor 2. J Biol Chem 277: 6830–6837.
  14. 14. LeVine AM, Whitsett JA, Gwozdz JA, Richardson TR, Fisher JH, et al. (2000) Distinct effects of surfactant protein A or D deficiency during bacterial infection on the lung. J Immunol 165: 3934–3940.
  15. 15. Liu CF, Rivere M, Huang HJ, Puzo G, Wang JY (2010) Surfactant protein D inhibits mite-induced alveolar macrophage and dendritic cell activations through TLR signalling and DC-SIGN expression. Clin Exp Allergy 40: 111–122.
  16. 16. Chapman SJ, Khor CC, Vannberg FO, Frodsham A, Walley A, et al. (2007) IkappaB genetic polymorphisms and invasive pneumococcal disease. Am J Respir Crit Care Med 176: 181–187.
  17. 17. Chapman SJ, Khor CC, Vannberg FO, Rautanen A, Segal S, et al. (2010) NFKBIZ polymorphisms and susceptibility to pneumococcal disease in European and African populations. Genes Immun 11: 319–325.
  18. 18. Khor CC, Chapman SJ, Vannberg FO, Dunne A, Murphy C, et al. (2007) A Mal functional variant is associated with protection against invasive pneumococcal disease, bacteremia, malaria and tuberculosis. Nat Genet 39: 523–528.
  19. 19. IOM (Institute of Medicine) (2010) Challenges and Opportunities in Using Residual Newborn Screening Samples for Translational Research: Workshop Summary. Washington, DC.
  20. 20. Dumitrescu L, Ritchie MD, Brown-Gentry K, Pulley JM, Basford M, et al. (2010) Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records. Genet Med 12: 648–650.
  21. 21. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753.
  22. 22. Schork NJ, Murray SS, Frazer KA, Topol EJ (2009) Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev 19: 212–219.
  23. 23. Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, et al. (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74: 106–120.
  24. 24. Crawford DC, Akey DT, Nickerson DA (2005) The patterns of natural variation in human genes. Annu Rev Genomics Hum Genet 6: 287–312.
  25. 25. Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, et al. (2004) Pattern of sequence variation across 213 environmental response genes. Genome Res 14: 1821–1831.
  26. 26. Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, et al. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307: 1072–1079.
  27. 27. International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299–1320.
  28. 28. Shen R, Fan JB, Campbell D, Chang W, Chen J, et al. (2005) High-throughput SNP genotyping on universal bead arrays. Mutat Res 573: 70–82.
  29. 29. Hosono S, Faruqi AF, Dean FB, Du Y, Sun Z, et al. (2003) Unbiased whole-genome amplification directly from clinical samples. Genome Res 13: 954–964.
  30. 30. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–1587.
  31. 31. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155: 945–959.
  32. 32. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575.
  33. 33. Pendergrass SA, Dudek SM, Crawford DC, Ritchie MD (2010) Synthesis-View: visualization and interpretation of SNP association results for multi-cohort, multi-phenotype data and meta-analysis S BioData Min. 3: 10.
  34. 34. Johansson L, Rytkonen A, Bergman P, Albiger B, Kallstrom H, et al. (2003) CD46 in meningococcal disease. Science 301: 373–375.
  35. 35. Seya T, Hirano A, Matsumoto M, Nomura M, Ueda S (1999) Human membrane cofactor protein (MCP, CD46): multiple isoforms and functions. Int J Biochem Cell Biol 31: 1255–1260.
  36. 36. Mold C, Nakayama S, Holzer TJ, Gewurz H, Du Clos TW (1981) C-reactive protein is protective against Streptococcus pneumoniae infection in mice. J Exp Med 154: 1703–1708.
  37. 37. Mold C, Rodic-Polic B, Du Clos TW (2002) Protection from Streptococcus pneumoniae infection by C-reactive protein and natural antibody requires complement but not Fc gamma receptors. J Immunol 168: 6375–6381.
  38. 38. Yother J, Volanakis JE, Briles DE (1982) Human C-reactive protein is protective against fatal Streptococcus pneumoniae infection in mice. J Immunol 128: 2374–2376.
  39. 39. van der Pol WL, Huizinga TW, Vidarsson G, van der Linden MW, Jansen MD, et al. (2001) Relevance of Fcgamma receptor and interleukin-10 polymorphisms for meningococcal disease. J Infect Dis 184: 1548–1555.
  40. 40. Yee AM, Ng SC, Sobel RE, Salmon JE (1997) Fc gammaRIIA polymorphism as a risk factor for invasive pneumococcal infections in systemic lupus erythematosus. Arthritis Rheum 40: 1180–1182.
  41. 41. Picard C, Puel A, Bonnet M, Ku CL, Bustamante J, et al. (2003) Pyogenic bacterial infections in humans with IRAK-4 deficiency. Science 299: 2076–2079.
  42. 42. Letiembre M, Echchannaoui H, Ferracin F, Rivest S, Landmann R (2005) Toll-like receptor-2 deficiency is associated with enhanced brain TNF gene expression during pneumococcal meningitis. J Neuroimmunol 168: 21–33.
  43. 43. Hibberd ML, Sumiya M, Summerfield JA, Booy R, Levin M (1999) Association of variants of the gene for mannose-binding lectin with susceptibility to meningococcal disease. Meningococcal Research Group. Lancet 353: 1049–1053.
  44. 44. Roy S, Knox K, Segal S, Griffiths D, Moore CE, et al. (2002) MBL genotype and risk of invasive pneumococcal disease: a case-control study. Lancet 359: 1569–1573.
  45. 45. Albiger B, Sandgren A, Katsuragi H, Meyer-Hoffert U, Beiter K, et al. (2005) Myeloid differentiation factor 88-dependent signalling controls bacterial growth during colonization and systemic pneumococcal disease in mice. Cell Microbiol 7: 1603–1615.
  46. 46. Hartshorn KL, Crouch E, White MR, Colamussi ML, Kakkanatt A, et al. (1998) Pulmonary surfactant proteins A and D enhance neutrophil uptake of bacteria. Am J Physiol 274: L958–969.
  47. 47. Lin Z, Pearson C, Chinchilli V, Pietschmann SM, Luo J, et al. (2000) Polymorphisms of human SP-A, SP-B, and SP-D genes: association of SP-B Thr131Ile with ARDS. Clin Genet 58: 181–191.
  48. 48. Echchannaoui H, Bachmann P, Letiembre M, Espinosa M, Landmann R (2005) Regulation of Streptococcus pneumoniae distribution by Toll-like receptor 2 in vivo. Immunobiology 210: 229–236.
  49. 49. Albiger B, Dahlberg S, Sandgren A, Wartha F, Beiter K, et al. (2007) Toll-like receptor 9 acts at an early stage in host defence against pneumococcal infection. Cell Microbiol 9: 633–644.
  50. 50. Yamamoto N, Kawakami K, Kinjo Y, Miyagi K, Kinjo T, et al. (2004) Essential role for the p40 subunit of interleukin-12 in neutrophil-mediated early host defense against pulmonary infection with Streptococcus pneumoniae: involvement of interferon-gamma. Microbes Infect 6: 1241–1249.
  51. 51. Read RC, Cannings C, Naylor SC, Timms JM, Maheswaran R, et al. (2003) Variation within genes encoding interleukin-1 and the interleukin-1 receptor antagonist influence the severity of meningococcal disease. Ann Intern Med 138: 534–541.
  52. 52. Fang XM, Schroder S, Hoeft A, Stuber F (1999) Comparison of two polymorphisms of the interleukin-1 gene family: interleukin-1 receptor antagonist polymorphism contributes to susceptibility to severe sepsis. Crit Care Med 27: 1330–1334.
  53. 53. Chen Q, Sen G, Snapper CM (2006) Endogenous IL-1R1 signaling is critical for cognate CD4+ T cell help for induction of in vivo type 1 and type 2 antipolysaccharide and antiprotein Ig isotype responses to intact Streptococcus pneumoniae, but not to a soluble pneumococcal conjugate vaccine. J Immunol 177: 6044–6051.
  54. 54. Emonts M, Veenhoven RH, Wiertsema SP, Houwing-Duistermaat JJ, Walraven V, et al. (2007) Genetic polymorphisms in immunoresponse genes TNFA, IL6, IL10, and TLR4 are associated with recurrent acute otitis media. Pediatrics 120: 814–823.
  55. 55. Schluter B, Raufhake C, Erren M, Schotte H, Kipp F, et al. (2002) Effect of the interleukin-6 promoter polymorphism (-174 G/C) on the incidence and outcome of sepsis. Crit Care Med 30: 32–37.
  56. 56. Schmeck B, Zahlten J, Moog K, van Laak V, Huber S, et al. (2004) Streptococcus pneumoniae-induced p38 MAPK-dependent phosphorylation of RelA at the interleukin-8 promotor. J Biol Chem 279: 53241–53247.
  57. 57. Gordon SB, Jarman ER, Kanyanda S, French N, Pridmore AC, et al. (2005) Reduced interleukin-8 response to Streptococcus pneumoniae by alveolar macrophages from adults with HIV/AIDS. AIDS 19: 1197–1200.
  58. 58. Westendorp RG, Langermans JA, Huizinga TW, Elouali AH, Verweij CL, et al. (1997) Genetic influence on cytokine production and fatal meningococcal disease. Lancet 349: 170–173.
  59. 59. Schaaf BM, Boehmke F, Esnaashari H, Seitzer U, Kothe H, et al. (2003) Pneumococcal septic shock is associated with the interleukin-10-1082 gene promoter polymorphism. Am J Respir Crit Care Med 168: 476–480.
  60. 60. Gallagher PM, Lowe G, Fitzgerald T, Bella A, Greene CM, et al. (2003) Association of IL-10 polymorphism with severity of illness in community acquired pneumonia. Thorax 58: 154–156.
  61. 61. Ling E, Feldman G, Dagan R, Mizrachi-Nebenzahl Y (2003) Cytokine mRNA expression in pneumococcal carriage, pneumonia, and sepsis in young mice. J Infect Dis 188: 1752–1756.
  62. 62. Lynch JM, Briles DE, Metzger DW (2003) Increased protection against pneumococcal disease by mucosal administration of conjugate vaccine plus interleukin-12. Infect Immun 71: 4780–4788.
  63. 63. Lauw FN, Branger J, Florquin S, Speelman P, van Deventer SJ, et al. (2002) IL-18 improves the early antimicrobial host response to pneumococcal pneumonia. J Immunol 168: 372–378.
  64. 64. Paterson GK, Blue CE, Mitchell TJ (2005) Role of interleukin-18 in experimental infections with Streptococcus pneumoniae. J Med Microbiol 54: 323–326.
  65. 65. Zwijnenburg PJ, van der Poll T, Florquin S, Akira S, Takeda K, et al. (2003) Interleukin-18 gene-deficient mice show enhanced defense and reduced inflammation during pneumococcal meningitis. J Neuroimmunol 138: 31–37.
  66. 66. Stuber F, Petersen M, Bokelmann F, Schade U (1996) A genomic polymorphism within the tumor necrosis factor locus influences plasma tumor necrosis factor-alpha concentrations and outcome of patients with severe sepsis. Crit Care Med 24: 381–384.
  67. 67. Mira JP, Cariou A, Grall F, Delclaux C, Losser MR, et al. (1999) Association of TNF2, a TNF-alpha promoter polymorphism, with septic shock susceptibility and mortality: a multicenter study. JAMA 282: 561–568.
  68. 68. McGuire W, Hill AV, Allsopp CE, Greenwood BM, Kwiatkowski D (1994) Variation in the TNF-alpha promoter region associated with susceptibility to cerebral malaria. Nature 371: 508–510.
  69. 69. Bernard GR (2003) Drotrecogin alfa (activated) (recombinant human activated protein C) for the treatment of severe sepsis. Crit Care Med 31: S85–93.
  70. 70. Bernard GR, Vincent JL, Laterre PF, LaRosa SP, Dhainaut JF, et al. (2001) Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med 344: 699–709.
  71. 71. Cundell DR, Gerard NP, Gerard C, Idanpaan-Heikkila I, Tuomanen EI (1995) Streptococcus pneumoniae anchor to activated human cells by the receptor for platelet-activating factor. Nature 377: 435–438.
  72. 72. Geishofer G, Binder A, Muller M, Zohrer B, Resch B, et al. (2005) 4G/5G promoter polymorphism in the plasminogen-activator-inhibitor-1 gene in children with systemic meningococcaemia. Eur J Pediatr 164: 486–490.
  73. 73. Westendorp RG, Hottenga JJ, Slagboom PE (1999) Variation in plasminogen-activator-inhibitor-1 gene and risk of meningococcal septic shock. Lancet 354: 561–563.
  74. 74. Jeurissen A, Wuyts G, Kasran A, Ramdien-Murli S, Blanckaert N, et al. (2004) The human antibody response to pneumococcal capsular polysaccharides is dependent on the CD40-CD40 ligand interaction. Eur J Immunol 34: 850–858.
  75. 75. Boudewijns M, Jeurissen A, Wuyts M, Moens L, Boon L, et al. (2005) Blockade of CTLA-4 (CD152) enhances the murine antibody response to pneumococcal capsular polysaccharides. J Leukoc Biol.
  76. 76. Matute-Bello G, Liles WC, Frevert CW, Dhanireddy S, Ballman K, et al. (2005) Blockade of the Fas/FasL system improves pneumococcal clearance from the lungs without preventing dissemination of bacteria to the spleen. J Infect Dis 191: 596–606.
  77. 77. Paul R, Angele B, Sporer B, Pfister HW, Koedel U (2004) Inflammatory response during bacterial meningitis is unchanged in Fas- and Fas ligand-deficient mice. J Neuroimmunol 152: 78–82.