A Genome-Wide Integrative Genomic Study Localizes Genetic Factors Influencing Antibodies against Epstein-Barr Virus Nuclear Antigen 1 (EBNA-1)

Infection with Epstein-Barr virus (EBV) is highly prevalent worldwide, and it has been associated with infectious mononucleosis and severe diseases including Burkitt lymphoma, Hodgkin lymphoma, nasopharyngeal lymphoma, and lymphoproliferative disorders. Although EBV has been the focus of extensive research, much still remains unknown concerning what makes some individuals more sensitive to infection and to adverse outcomes as a result of infection. Here we use an integrative genomics approach in order to localize genetic factors influencing levels of Epstein Barr virus (EBV) nuclear antigen-1 (EBNA-1) IgG antibodies, as a measure of history of infection with this pathogen, in large Mexican American families. Genome-wide evidence of both significant linkage and association was obtained on chromosome 6 in the human leukocyte antigen (HLA) region and replicated in an independent Mexican American sample of large families (minimum p-value in combined analysis of both datasets is 1.4×10−15 for SNPs rs477515 and rs2516049). Conditional association analyses indicate the presence of at least two separate loci within MHC class II, and along with lymphocyte expression data suggest genes HLA-DRB1 and HLA-DQB1 as the best candidates. The association signals are specific to EBV and are not found with IgG antibodies to 12 other pathogens examined, and therefore do not simply reveal a general HLA effect. We investigated whether SNPs significantly associated with diseases in which EBV is known or suspected to play a role (namely nasopharyngeal lymphoma, Hodgkin lymphoma, systemic lupus erythematosus, and multiple sclerosis) also show evidence of associated with EBNA-1 antibody levels, finding an overlap only for the HLA locus, but none elsewhere in the genome. The significance of this work is that a major locus related to EBV infection has been identified, which may ultimately reveal the underlying mechanisms by which the immune system regulates infection with this pathogen.


Introduction
Epstein-Barr virus (EBV) belongs to the herpes virus family, which consists of double-stranded DNA viruses, composed of a DNA core surrounded by a nucleocapsid and a tegument, with relatively large genomes (100-200 genes). There are currently eight known human herpesviruses, which include herpes simplex virus type I, herpes simplex virus type II, varicella-zoster virus, EBV, cytomegalovirus, human herpesvirus 6, human herpesvirus 7, and Kaposi sarcoma herpesvirus. Infection with EBV is common, with over 90% of the world's adult population estimated to be infected [1]. EBV is thought to be typically transmitted through contact with saliva, infecting B lymphocytes and epithelial cells of the oropharynx [2]. The virus is shed consistently into saliva during primary infection, but intermittent shedding can continue for years afterwards. Following initial infection, EBV establishes a dormant, lifelong infection (mainly in memory B cells) and retains the potential to reactivate [3,4]. In developing countries primary EBV infection usually occurs during infancy or early childhood, and is typically without clinical symptoms or presents as a mild febrile illness. In more affluent countries infection in childhood is still common, but for approximately one third of individuals primary infection occurs during adolescence or early adulthood and is associated with a higher risk of infectious mononucleosis [5]. EBV infection has also been associated with some malignant conditions including Burkitt lymphoma, nasopharyngeal carcinoma, some gastric cancers, and Hodgkin lymphoma [6][7][8][9][10][11].
EBV has been studied extensively. It was the first human tumor virus identified, and its viral genome was the first to be fully described [12][13]. While genetic risk factors have been reported for particular EBV-infected subpopulations (e.g., individuals with Hodgkin lymphoma, infectious mononucleosis, and multiple sclerosis [14][15][16][17][18][19]), there is still much that remains unknown about inter-individual differences in antibody response to EBV exposure and potential adverse outcomes related to infection with this pathogen. Here we have quantified antibody titer to the Epstein-Barr virus nuclear antigen 1 (anti-EBNA-1), which reflects history of infection with this pathogen, in a randomly ascertained sample (i.e., one that is not enriched for a particular phenotype) of Mexican Americans and used genome-wide linkage and association analyses and expression profiling on lymphocytes to identify underlying genetic loci.

Seroprevalence
IgG antibodies to EBNA-1 were quantified in plasma samples from 1,367 randomly ascertained and naturally infected Mexican American participants of the San Antonio Family Heart Study (SAFHS), which represent 63 families, many of which are large genealogies (Table S1). While seropositivity to infection with EBV is sometimes measured using IgG antibodies against the viral capsid antigen (VCA), here we refer to EBV seropositivity as based on measurement of anti-EBNA-1 antibodies, which are produced during latent EBV infection. In this study, 48% of individuals are categorized as seropositive for anti-EBNA-1 antibodies, 24% have intermediate (seroindeterminate) levels, and 28% are seronegative. EBNA-1 seroprevalence is similar in men and women and does not change substantially with age within the examined age range of 16-94 years (Figure 1), indicating that most subjects underwent primary infection before adulthood. Given the prevalence of EBV infection and mode of transmission, we assume that essentially all individuals have been exposed to this pathogen multiple times during their lifetime, and therefore seronegative status should be informative in that those individuals failed to mount an antibodymediated immune response to EBNA-1 (or mounted only a response so weak as to lead to undetectable antibody levels).

Heritability analysis
While the measured antibody assay generates a quantitative read-out of antibody levels, conceptually the antibody-mediated immune response may be viewed as a yes/no trait, in which the immune system did or did not mount an antibody response after exposure, and thus it may also be justifiable to discretize the assay results. We have performed all analyses presented below on both the quantitative antibody phenotype and the dichotomized serostatus phenotype. For clarity, in the text we pay more attention to the results with the quantitative phenotype. In addition, we view the quantitative trait as more informative and the statistical results to be more reliable, partly because the cut-off levels for discretization are somewhat arbitrary. However, the results for both traits are presented in the Figures and Tables, and the findings are highly consistent with one another.
Heritability rates of EBNA-1 serological phenotypes were estimated using a variance components approach, and shared environment was accounted for by using a ''household'' random effects model [20]. EBNA-1 serological measures are significantly heritable, at 43% (p%10 29 ) and 68% (p%10 29 ) for the quantitative antibody titer and discrete serostatus traits, respectively (Table 1), indicating that host-genetic factors are important determinants of immune response (as we have previously found to be the case for other pathogens [21]). However, household effects, which are one measure of shared environmental exposures and Figure 1. EBV seroprevalence, based on measurement of anti-EBNA-1 antibodies, by sex and age for SAFHS. Sliding 15-year age windows are used to smooth the curves, and age shown is the midpoint of each age interval. doi:10.1371/journal.pgen.1003147.g001

Author Summary
Many factors influence individual differences in susceptibility to infectious disease, including genetic factors of the host. Here we use several genome-wide investigative tools (linkage, association, joint linkage and association, and the analysis of gene expression data) to search for host genetic factors influencing Epstein-Barr virus (EBV) infection. EBV is a human herpes virus that infects up to 90% of adults worldwide, infection with which has been associated with severe complications including malignancies and autoimmune disorders. In a sample of .1,300 Mexican American family members, we found significant evidence of association of anti-EBV antibody levels with loci on chromosome 6 in the human leukocyte antigen region, which contains genes related to immune function. The top two independent loci in this region were HLA-DRB1 and HLA-DQB1, both of which are involved in the presentation of foreign antigens to T cells. This finding was specific to EBV and not to 12 other pathogens we examined. We also report an overlap of genetic factors influencing both EBV antibody level and EBV-related cancers and autoimmune disorders. This work demonstrates the presence of EBV susceptibility loci and provides impetus for further investigation to better understand the underlying mechanisms related to differences in disease progression among individuals infected with this pathogen.
which were based on reported co-habitation at the time of blood draw for serological assaying, are not significant in this sample. Given the high prevalence of this pathogen, this observation may indicate that an individual is just as likely to be infected by someone who does not share the same residence as they are by someone who does.

Linkage and association analyses
To localize any underlying genetic factors, we performed a variety of genome-wide analyses using nearly 1 million available SNPs. The SAFHS consists of extended families that provide information on linkage as well as association. We therefore performed joint analysis of linkage and association, thereby using both information sources available in families in order to localize the responsible loci. After Bonferroni correction for the number of SNPs analyzed, multiple genome-wide significant SNPs (most significant p-values of 3.3610 29 and 8.3610 211 for the quantitative and dichotomous serological trait, respectively) were found in the human leukocyte antigen (HLA) region on chromosome 6 ( Figure 2, Table S2). The HLA region is also implicated by linkage analysis itself, with maximum LOD scores of 1.27 and 3.05 for the quantitative and discrete trait, respectively ( Figure S1). No genome-wide significant evidence of linkage and/or association was found elsewhere in the genome.
Having identified the HLA region as a major locus, we performed association analysis conditional on linkage, in order to separate out the association signal from individual SNPs (see Methods section for more detail). The most significant association p-value for the quantitative trait occurs with SNPs rs477515 and rs2516049 (p = 1.4610 28 ) ( Table 2), two SNPs that are in complete linkage disequilibrium (LD) with each other and are located within the HLA-DRB1 gene in the HLA class II region. The most significant result for the discrete trait is for SNP rs9268832 in HLA-DRB9 (p = 2.2610 29 ). In all, 5 and 19 SNPs reach genome-wide significance for the quantitative and discrete EBNA-1 traits, respectively, and all are located in the HLA region of chromosome 6. As shown by the quantile-quantile plot, the HLA region accounts for the entirety of the deviation from the diagonal line of expected p-values under the null hypothesis, and there was no evidence of any inflation in p-values once the HLA region was excluded from the plot ( Figure S2).

Replication
To confirm our findings, we generated anti-EBNA-1 antibody measurements (using the same assay) in plasma samples from 589 participants (representing 39, mainly extended families [ Table S1]) in a separate Mexican American family cohort from San Antonio, the San Antonio Family Diabetes/Gall Bladder Study (SAFDGS) [22]. EBNA-1 seroprevalence in this cohort was estimated to be somewhat higher at 85%. Heritability estimates are 37% (p = 1.0610 26 ) and 42% (p = 1.9610 23 ) for the quantitative and discrete traits, respectively, nearly identical to the prior estimates from the SAFHS. Linkage analysis yielded LOD scores of 3.37 and 2.22 in the HLA region on chromosome 6 for the quantitative and dichotomous traits, respectively ( Figure S3). Also, genomewide joint linkage and association analysis points to significant results only for chromosome 6, and not elsewhere in the genome ( Figure S4). We then performed association analysis conditional on linkage of all SNPs from the extended HLA region, and after Bonferroni correction for the number of SNPs analyzed we obtained significant evidence of association with the same two SNPs (rs477515/rs2516049) (most significant p-values of 4.1610 26 and 8.8610 27 for the quantitative and discrete traits, respectively) ( Table 2).
Given the unambiguous replication of the HLA locus, including evidence of linkage and association in the extended HLA region in both datasets, we also performed joint analysis of both pedigree cohorts (SAFHS+SAFDGS) using all available SNPs in the extended HLA region, in order to use all available information for identification and prioritization of the most candidate SNPs within this region. The minimum p-values for association analysis conditional on linkage for the combined dataset were even more significant and occurred at the very SNPs that gave the most significant results for each dataset alone (p = 3.1610 213 and p = 3.9610 212 for SNPs rs477515/rs2516049 for the quantitative and discrete traits, respectively) ( Figure 3, Table 2).

Conditional association analyses
The HLA region is highly complex and exhibits considerable and long-range LD. In order to determine whether a single or multiple haplotype block harbors genetic variants influencing the serological EBNA-1 phenotypes, we performed several rounds of conditional association analysis within the extended HLA region using the combined sample of both studies. We first conditioned on the most significant SNP for the quantitative trait (rs477515/ rs2516049) and identified one additional significant SNP (rs2854275, located within the HLA-DQB1 gene in the MHC class II region) that was independently associated with EBNA-1 at a genome-wide level of significance (Table 3, Figure S5). After conditioning on both independent SNPs (rs477515/rs2516049 and rs2854275), no additional SNPs were significant for the quantitative antibody trait. This suggests that at least two haplotype blocks harbor variants influencing EBNA-1 seroreactivity. The pattern of LD among SNPs giving genome-wide significant association evidence with either EBNA-1 quantitative or dichotomous EBNA-1 trait is shown in Figure S6.

Expression profile analysis
In order to pinpoint the most likely gene(s) influencing anti-EBNA-1 antibodies, we used an integrative genomics approach based on available expression profiles from 1,243 peripheral blood mononuclear cell (PBMC) samples (collected at the same point in time as the plasma samples used for antibody assays) from SAFHS study participants. Specifically, we examined whether the SNPs   Since we used a liability threshold model for analysis of the dichotomous trait (see methods section), the direction of effect on EBNA-1 discrete serostatus is opposite of the sign of the regression coefficient, but is in the same direction as the regression coefficient for the quantitative trait (e.g. for SNP rs204999, the minor allele is associated with a decrease in EBNA-1 antibody level and seronegativity). doi:10.1371/journal.pgen.1003147.t002 that are significantly associated with anti-EBNA-1 antibody status are also significantly associated with expression levels of any nearby gene transcripts (which would suggest that such SNPs are putative cis-regulatory variants of these transcripts), and whether those transcript levels in turn are significantly associated with antibody status. Using the 41 SNPs that were significantly associated with EBNA-1 seroreactivity in the combined SAFHS+SAFDGS sample (with either the quantitative and/or discrete trait, and in the initial and/or subsequent association analysis, i.e. the SNPs included in Table 2 and Table 3), we conducted association analyses on the 150 expressed transcripts from the extended HLA region, yielding 6,750 SNP-transcript pairs. After Bonferroni correction, we observed significant association results (conditional on linkage) for four SNP-transcript pairs (Table S3). SNPs rs204999 and rs10947261 are significantly associated with expression of the housekeeping gene RPS18 (p = 4.8610 25 and p = 3.0610 24 , respectively), and SNPs rs9273327 and rs2854275 are significantly associated with PBX2 expression. PBX2 is a gene involved in B-cell and certain T-cell leukemias. Our results indicate that that these SNPs (or variants in LD with them) may be putative cis-acting regulators of these genes. However, the evidence of association is only of moderate strength for any SNP-transcript pair (the p-values are significant after Bonferroni correction for the number of transcripts analyzed within the extended HLA region, but not if one were to impose a genome-wide multiple testing correction). In addition, the distance between SNPs and probes is fairly large compared to commonly observed, strong cis-acting expression nucleotides. Other SNP-transcript pairs yielded suggestive, but not statistically significant, results after correction for multiple testing.
We subsequently looked at whether any HLA transcripts were significantly correlated with EBNA-1 seroreactivity. However, the expression levels of RPS18 and PBX2, which we had found to be potentially cis-regulated by SNPs associated with EBNA-1 antibody phenotypes, were not significantly correlated with the EBNA-1 traits (p = 0.17 and p = 0.82, for RPS18 and PBX2 respectively, for the quantitative trait). Among the other HLA transcripts, the expression level of HLA-DRB1 is most significantly associated with both anti-EBNA-1 traits (quantitative: p = 2.8610 25 ; discrete: p = 3.0610 24 ) (Table S4). Thus, in conclusion, while our integrative genomic analyses point to potential candidate genes, the evidence obtained does not yield overwhelming support for a particular candidate gene. A potential explanation may be the fact that the relevant differences in HLA function are attributable to alteration in protein sequence or binding affinity rather than gene expression level. The top SNP, rs477515/rs2516049, was determined based on analysis of the quantitative antibody trait in the combined sample (SAFHS and SAFDGS). Only the extended HLA region was analyzed. Shown are the SNPs yielding genome-wide significant p-values (p#5.29610 28 , shown in bold) with either the quantitative and/or the qualitative antibody phenotype in the combined sample. The regression coefficients refer to the estimated change in the phenotype for each dose of the rarer SNP allele.
a Since we used a liability threshold model for analysis of the dichotomous trait (see methods section), the direction of effect on EBNA-1 discrete serostatus is opposite of the sign of the regression coefficient, but is in the same direction as the regression coefficient for the quantitative trait (e.g. for SNP rs204999, the minor allele is associated with a decrease in EBNA-1 antibody level and seronegativity). doi:10.1371/journal.pgen.1003147.t003

Examination of other pathogens
As the HLA region is well known to play a role in many aspects of the immune system, it may not be surprising that genetic variants therein appear to influence anti-EBNA-1 antibody levels.
To examine whether the identified locus is specific to EBNA-1, or whether it plays a role in determining antibody titers more generally, we assessed whether the EBNA-1-associated SNPs are also significantly associated with antibodies directed at other herpesviruses or other pathogens. IgG antibody titers had previously been measured on the same plasma samples [23] for 12 other pathogens, including 5 herpesviruses. We focused on the top two independent SNPs associated with the quantitative EBV antibody titer in the SAFHS, and found no evidence of association of SNPs rs477515/rs2516049 or rs2854275 with any of these other pathogens, suggesting that the identified loci are specific to EBV (Table 4) and not some general IgG HLA control mechanism.

Relationship to EBV-related cancer
Several types of cancer have been linked to infection with EBV, we therefore looked for evidence of genetic overlap between anti-EBNA-1 antibody traits and published susceptibility loci for two common EBV-related cancers, nasopharyngeal carcinoma (NPC) and Hodgkin lymphoma (HL). A comparison with Burkitt lymphoma was not made, however, as genetic association loci were not available in the published literature. The results for association analysis (conditional on linkage) for EBNA-1 quantitative and discrete traits for NPC-related SNPs are presented in Table 5, and Table 6 presents the results for HL-related SNPs. Two NPC SNPs, rs2860580 and rs28421666, were significantly associated with the EBNA-1 traits, after applying a Bonferroni correction to account for testing 23 SNPs. Both SNPs are located on chromosome 6, in HLA class I and II regions, respectively. Similarly, four HL SNPs that were found to be significantly associated with EBNA-1 trait (rs204000, rs9268542, rs2395185, and rs2858870) were also located in the HLA region. EBNA-1 traits were not significantly associated with cancer-related SNPs that are located outside the HLA region.

Relationship to autoimmune diseases
Because EBV infection has been associated with certain autoimmune diseases, in particular systemic lupus erythematosus (SLE) and multiple sclerosis (MS), we examined whether there is evidence for overlap of genetic factors influencing these traits and anti-EBNA-1 antibody traits by investigating whether any of the previously reported SNPs significantly associated with these disorders also show association with EBNA-1 antibody traits. Table 7 provides the results of the association analysis (conditional on linkage) for EBNA-1 quantitative and discrete traits with SLE-associated SNPs taken from the literature, and Table 8 focuses on SNPs associated with MS. After applying a Bonferroni correction for examining 47 SLE-relevant SNPs, rs9268832 and rs9271366 were significantly associated with both EBNA-1 serological traits, and rs3135391 with the discrete EBNA-1 trait. Notably, all three of these SNPs are located in the HLA region on chromosome 6. None of the 41 SLE-associated SNPs outside of the HLA region show any evidence of association to EBNA-1. A comparison with 30 genome-wide significant MS SNPs from published reports yielded similar results, with statistically significant association results, for both EBNA-1 traits with SNP rs9271366, and also for the discrete EBNA-1 serostatus trait with SNPs rs3129860 and rs3135388, all of which are located within the HLA region ( Table 8). As with SLE, there was no evidence for association of EBNA-1 with MS-associated SNPs from anywhere else in the genome.

Discussion
In this study, we estimated the seroprevalence rate of EBV infection as 48% seropositive in the study population of 1,367 Mexican American participants from the SAFHS. Our estimate is lower than estimates of EBV prevalence for other adult populations [24], but this study characterized anti-EBNA-1 antibody titers, while many other estimates are based on measurements of IgG antibodies against EBV VCA. Typically, anti-VCA antibody titers will give a slightly higher estimate, as some anti-VCA positive individuals will subsequently fail to also    make anti-EBNA-1 antibodies [25]. In addition, there may be other variations between assays and their cutoff values. When we include the indeterminate samples as seropositive in our analysis, the estimate increases to 70% EBV seropositivity, which is close to estimates for the U.S. general adult population of 73% to 90% [26]. In the replication study (SAFDGS), the same assay yielded a higher seroprevalence estimate (85%). The reason for this difference is unclear, but may be related to simple threshold effects that magnify differences between assays being run at two separate points in time when dichotomizing quantitative assay read-outs. A comparison of our heritability estimates for anti-EBNA-1 antibody titers (h 2 = 0.43 for the SAFHS, and h 2 = 0.37 for the SAFDGS) with estimates for anti-VCA antibody titers (h 2 = 0.32-0.48 [27]) shows that they fall within the same range.
Our study identified multiple, significant associations of anti-EBNA-1 antibody measures with genetic factors located in the HLA region, which contains genes related to immune function in humans. These associations were not found for seroreactivity to 12 other pathogens examined in this study, and therefore appear to be specific to EBNA-1. HLA class I genes are involved in the presentation of peptides (including viral antigens) from within the cell, which attract CD8+ T lymphocytes (cytotoxic T cells) to destroy cells presenting foreign antigens. Previous research identified genetic loci within this region that were associated with the development of infectious mononucleosis upon primary infection with EBV, suggesting that HLA class I polymorphisms influence T cell response during primary EBV infection and viral persistence [17]. The HLA class I region has also been implicated in the development of classical Hodgkin lymphoma among EBVpositive individuals [14][15][16]. HLA class II genes are involved in presenting peptides from outside the cell to CD4+ T lymphocytes (helper T cells), which in turn stimulate B cells to produce antibodies. In our study, genes significantly associated with anti-EBNA-1 antibody levels belong to HLA class II. This supports earlier reports of an association between HLA class II and EBV susceptibility among individuals with multiple sclerosis [17][18][19]. EBV primarily targets resting B cell lymphocytes, which are induced to proliferate, and also infects epithelial cells of the nasopharynx and oropharynx. EBV infects B lymphocytes via attachment to the target cell by binding of the viral major envelope glycoprotein gp350 to complement receptor type two, CD21, on the cell surface [28]. Subsequent penetration of the cell membrane requires a complex of three glycoproteins: gH and gL, which have functional homologs in other herpesviruses; and gp42, which is EBV-specific. Glycoprotein gp42 binds to HLA-DR on the host cell, and in this way HLA class II molecules serve as cofactors for EBV infection of B cells [29]. In our study, we demonstrate a significant association between EBV serostatus and SNP rs477515/rs2516049, which is located in gene HLA-DRB1 within the HLA-DR gene cluster, and the expression level of HLA-DRB1 is also significantly correlated with both EBNA-1 serological traits, though we did not observe evidence indicating that the particular EBNA-1-associated SNPs are likely cis-acting regulators of this gene. Nearby genes HLA-DRA and HLA-DRB9 do appear to be associated with significant EBNA-1 SNPs, but their expression levels are not significantly correlated with the examined antibody traits. Nonetheless, based on our results, these genes appear to be the best candidates for playing a role in EBV susceptibility in the study population. This may be related to viral penetration of B cells, but it is more likely due to specific haplotype and T cell recognition, as HLA class II genes are also involved in the presentation of viral antigens to T cells, which is important in suppressing proliferation of EBV-infected B cells. HLA-DR is a class II cell surface receptor that serves as a ligand for the T cell receptor. The primary function of HLA-DR is the presentation of peptide antigens to the immune system, and it is closely linked to HLA-DQ, another molecule that presents antigens to T cells. In our study, after conditioning on the top SNP (rs477515/ rs2516049) there was significant association of EBNA-1 serostatus and a second independent SNP (rs2854275) that is located within the HLA-DQB1 gene. After binding to the foreign antigen, T cells stimulate B-cells to produce anti-EBV antibodies. Our finding of significant SNPs located in the HLA-DR and HLA-DQ genes may relate to the efficiency of cell surface antigen presentation to T cell receptors in EBNA-1 seropositive individuals.
Under normal circumstances, the host immune system is capable of limiting the proliferation of EBV-infected B cells through natural killer (NK) cell and T cell responses. However, some copies of the virus will become latent in memory B cells, at concentrations of approximately 1 to 50 per 10 6 in cells within peripheral blood in healthy individuals [30]. Although most individuals will not experience clinical symptoms after EBV enters latency, a small percentage may develop cancer following a reactivation of infection. During active infection, the virus produces approximately 100 different viral proteins that are involved in viral replication and modulating the immune response in the host. However, during latent infection only about 10 proteins are produced by the virus, including EBNA-1. This protein, which was used in our study to quantify EBV antibody titer, correlates closely with past infection [31] and is expressed on all EBV-associated tumors. EBNA-1 is suggested to elicit poor CD8+ T cell response [32] and it is considered to be a universal viral oncogene [2]. EBVassociated malignancies are higher in particular geographic locations as well as among certain ethnic groups, indicating that both environmental exposures and genetic factors are likely involved in disease risk. Cancers linked to EBV infection include: Burkitt lymphoma, which is prevalent in Africa and for which malaria may be a cofactor [6]; nasopharyngeal carcinoma, which is more common among individuals from South China [33,34]; Hodgkin lymphoma, which is reported to have a higher incidence in Hispanic and Asian/Pacific Islander populations [35]; parotid tumors in patients from Alaska [8]; and some gastric cancers [9]. EBV infection is also associated with post-transplant lymphoproliferative disorders [36]. Serological evidence points to high EBNA-1 antibody titers prior to the onset of clinical symptoms for several of these malignancies [7,37]. In our study, we found an overlap between EBNA-1 traits and NPC susceptibility loci in HLA-A and HLA-DR/DQ genes, and with HL loci in BTNL2, and HLA-DR genes. It has been suggested that the presentation of EBV-derived peptides is in some way involved in the pathogenesis of EBV-related cancer [16]. We also found suggestive evidence for association of top SNPs with the expression of EGFL8, which has previously been implicated in cancer progression, and NCR3, a gene that encodes a natural cytotoxicity receptor that may aid NK cells in lysing tumor cells.
Studies indicate that EBV may be implicated in the development of autoimmune disease, including systemic lupus erythematosus (SLE) and multiple sclerosis (MS) [38,39]. In SLE, patients may be unable to keep latent EBV infection in check, possibly due to defective T cell response to the virus. Molecular mimicry appears to play a role in SLE autoimmunity, as humoral immune response initially targets the proline-rich repeat motif PPPGMRPP, antibodies against which also cross-react with the EBNA-1 peptide PPPGRRP [40,41]. In our study we provide evidence for shared genetic factors that influence both anti-EBNA-1 antibody status and autoimmunity, as shown by a significant association between EBNA-1 serological measures and SLE SNPs rs3135391, rs9268832 and rs9271366, all located in the HLA region. SNP rs9268832 was both the top SNP associated with the EBNA-1 discrete serostatus trait in our study of Mexican Americans and the top SNP identified in a Spanish SLE cohort [42]. Given that this SNP lies within the HLA class II pseudogene DRB9, it has been suggested that this association is due to the composite effect of SNPs rs3130490 (located in the MSH5 gene) and rs3129768 (located between HLA-DRB1 and HLA-DQA1 genes). Dysregulation of the MSH5 gene, which plays a role in immunoglobulin class switching, allowing B cells to generate different classes of antibody but with the same specificity, is proposed to contribute to risk of developing SLE. While SNPs located within this gene were not statistically significant in our sample after adjusting for multiple testing, other genetic factors that appear to influence both EBNA-1 serostatus and SLE include HLA-DR and HLA-DQ loci, possibly due to mechanisms shared across various autoimmune and inflammatory diseases. Susceptibility to MS was previously shown to be associated with HLA genes, and with HLA-DRB1 in particular [43], which in our study was significantly related to EBNA-1 serostatus. Indeed, our results, which are based on genome-wide investigation, support an earlier report of association between anti-EBNA-1 antibody titer level and HLA-DRB1 in an MS cohort [18]. That study observed that HLA-DRB1*15 positive individuals had a higher level of anti-EBNA-1 antibody titer and greater risk of developing MS, indicating that HLA genetic influence on MS risk may also involve control of EBV infection. In addition to HLA-DRB1, MS has been associated with changes in the expression of a number of other genes, including other HLA-DR genes and HLA-DQ genes [44]. We found evidence for a significant overlap between anti-EBNA-1 antibodies and three MS HLA SNPs (rs3129860, rs3135388, and rs9271366), which are associated with HLA-DR and HLA-DQ genes, and may be related to a more general autoimmune/inflammatory response.
In summary, the results of our study indicate that genetic determinants in the HLA region are important in the immune response to EBV and subsequent regulation of infection with this pathogen. Variation in EBNA-1 antibody titer among individuals may be due in part to variation in B cell permeability to EBV infection and/or differences in cell surface antigen presentation, as indicated by a statistically significant relationship between genetic factors within the HLA II region and EBV serostatus (defined here by the level of anti-EBNA-1 antibodies) that was not found for the other pathogens examined. Further investigation may reveal the underlying mechanisms by which these HLA genes potentially limit EBV viral load, possibly influencing risk for developing autoimmune disease or cancer in infected individuals.

Ethics statement
The study and protocols were approved by the Institutional Review Board at the University of Texas Health Science Center at San Antonio, and informed consent was obtained from all participants.

Study population
Individuals in this study included 1,367 members of extended, multi-generational families from the Mexican American community in San Antonio, Texas, and surrounding region. They were recruited during the years 1991-1995 for participation in the San Antonio Family Heart Study (SAFHS), which seeks to identify genetic risk factors for cardiovascular disease [45], and were ascertained without regard to any specific disease phenotype. Participants included 551 men and 816 women, who ranged in age from 16-94 years and represented 63 families (Table S1). These families had up to 6 generations and the largest consisted of 101 phenotyped individuals. Included in the study were 267 sibships, with an average size of 3.3 and size range of 2-12.
Significant findings were replicated for a separate sample of 589 Mexican Americans participating in the San Antonio Diabetes/ Gallbladder Study (SAFDGS). The SAFDGS seeks to investigate the genetic influences underlying type II diabetes mellitus and gallbladder disease, and it is similar in design to the SAFHS except that recruitment was based on a single diabetic proband in each pedigree, and the sample is therefore enriched for diabetics [21,46]. Please note that this is a very weak form of ascertainment in Mexican Americans from San Antonio, where lifetime prevalence of diabetes approaches 30%. In fact, 20 years after the initiation of both studies, the prevalence rates of major diseases, such as heart disease, diabetes, and obesity, are not significantly different between these two component studies. The SAFDGS participants consisted of 39 families, representing up to 6 generations, and included 115 sibships, which ranged in size from 2-9 (average size of 3.2). Analyses were also run on the combined data set (SAFHS+SAFDGS), which included 1,956 phenotyped individuals. Familial relationships were confirmed based on genotype composition using PREST [47].

Sample collection and determination of serostatus
Blood samples were collected from participants after an overnight fast, at the time of recruitment (1991-1995) using EDTA vacutainers. Frozen plasma aliquots were obtained as previously described [48], along with the buffy coat for DNA extraction, and stored at 280uC. Plasma samples were thawed just prior to antibody determinations, and IgG antibodies to Epstein-Barr virus nuclear antigen 1 (EBNA-1) were measured using a commercially available enzyme-linked immunosorbent assay (ELISA) kit (IBL-America, Minneapolis, MN). Seropositive/seronegative status was determined according to the manufacturer's instructions using the following absorbance values: seronegative if #0.9; indeterminate if .0.9 and ,1.1; and seropositive if $1.1. Antibody titers to 12 comparative pathogens were also obtained and included: Chlamydophila pneumoniae, Helicobacter pylori, Toxoplasma gondii, cytomegalovirus (CMV), herpes simplex type I virus (HSV-1), herpes simplex type II virus (HSV-2), human herpesvirus 6 (HHV-6), varicella zoster virus (VZV), adenovirus 36 (Ad36), hepatitis A virus (HAV), influenza A virus, and influenza B virus [23]. For Ad36, a previously described serum neutralization test was utilized for measuring antibodies [49], and analyses were run in duplicate, with specimens assigned as seropositive if both replicates had neutralization titers $1:8, otherwise they were considered to be seronegative. Serostatus for all other pathogens was determined using the same criteria as for EBNA-1 (i.e., seronegative if #0.9; indeterminate if .0.9 and ,1.1; and seropositive if $1.1 [23]).

SNP genotyping
DNA was extracted from the lymphocyte samples from study participants. SNPs were typed using several versions of Illumina's SNP genotyping BeadChip microarrays (HumanHap550v3, HumanExon510Sv1, Human1Mv1 and Human1M-Duov3), according to the Illumina Infinium protocol (Illumina, San Diego, CA). SNP genotype data underwent extensive processing prior to analysis: SNPs with a low call rate, that were monomorphic or those comprising ,10 individuals with the minor allele were excluded from analysis. Additional SNPs were excluded if Hardy-Weinberg Equilibrium test statistics were equivalent to p#10 24 (calculated using SOLAR [50] while taking relationships properly into account), leaving a total of 944,565 SNPs for further analysis. Allele frequencies were computed using maximum likelihood estimates in SOLAR [50]. SNP genotypes were checked for Mendelian consistency using Simwalk [51]. MERLIN [52] was used to impute missing genotypes conditional on relatives' genotypes, with a weighted average of possible genotypes being used when an individual's genotype could not be inferred with certainty. Multipoint identity-by-descent (MIBD) matrices, based on a subset of 28,219 informative SNPs that were not in LD with one another, were calculated with LOKI [53]. The chromosomal maps used in the analyses were based on those generated by deCODE genetics [54].

Statistical analysis
Both anti-EBNA-1 quantitative antibody titer and discrete serostatus traits were analyzed. Statistical analyses of the sample of related individuals were performed using a variance components (VC) approach with the computer software package SOLAR [50]. Due to the sensitivity of VC analyses to non-normality, the quantitative antibody titer trait was transformed prior to analysis using an inverse, rank-based normalization to ensure a standard normal distribution of this phenotype. For the genetic analysis of the discrete EBNA-1 serostatus trait within a VC framework, a liability threshold model was used, in which serostatus was assumed to reflect an unobservable underlying quantitative liability, with individuals above a threshold being seropositive, and those below being seronegative [55,56], individuals with indeterminate serostatus were excluded from analysis. All analyses included sex, age (at the time of sample collection), and their interactions as covariates. Although the SAFDGS was enriched for diabetics, diabetes status was found to not be a significant predictor of EBNA-1 antibody status and therefore was not included in further analyses. Narrow-sense heritability, or the proportion of phenotypic variance attributable to the aggregate effects of additive genetic variation, was estimated along with the influence of shared environmental factors, which were modeled using a ''household'' random effects component [20]. Individuals living together at the time of the blood draw were considered members of the same household. Details concerning the length of cohabitation, however, were not available. Because the SAFHS includes extended families, which provide information on both linkage and association, we performed several analyses in order to maximize the amount of information obtained from this sample. We performed genome-wide linkage analysis, using MIBDs based on 28,219 SNPs, to identify regions of the genome that may harbor genetic variants influencing EBNA-1 serological traits. We also performed joint genome-wide linkage and association analysis, based on 944,565 SNPs that were available for 1,367 individuals, in order to have more power for localizing the responsible loci. The joint analysis was conducted under a VC model in which the linkage component was implemented as a regular VC-based random effects linkage model, and an additive measured genotype model was used for the association component. Two-times the natural logarithm of the likelihood ratio of the joint linkage and association test was assumed to be distributed as a 50:50 mixture of chi-squared random variables with 1 and 2 degrees of freedom, respectively. In order to remove the long-distance linkage effect and hone in on the shorterrange association signal, and thus achieve better differentiation among SNPs, we then performed association conditional on linkage for the extended HLA region (nucleotide positions 29,677,984 to 33,485,635), based on all 5689 available SNPs. As population substructure may result in spurious associations in GWAS studies, we corrected for this by using principal components analysis to model differences in ancestral contributions among study participants [57]. R princomp [58] was used to run the principal components analysis on a subset of 11,512 autosomal SNPs (determined to be in low mutual linkage distribution [LD]) in 345 genotyped founders, and offspring were assigned PC values averaged over their parents, in order to not accidentally remove true pedigree differences. The first five principal components were included as additional covariates in all statistical analyses (these account for ,3% of the variance in the genotype scores, indicating that there is in fact little evidence for stratification in the Mexican American cohort). Given that there is a large amount LD in the HLA region on chromosome 6, we ran multiple conditional analyses on the top EBNA-1 SNPs, in order to determine the number of independent significantly-associated SNPs. In addition, LD specific to the Mexican American study population, and appropriately taking the relatedness into account, was calculated using SOLAR [50] and regional plots, based on this information, were generated using LocusZoom [59].

Transcriptional profiles
Transcriptional profile data were available for PBMCs from 1,243 study participants, collected at the same time as the plasma samples used for EBNA-1 screening, as previously described [60]. Raw and normalized expression values are available under the accession number E-TABM-305 at: http://www.ebi.ac.uk/ arrayexpress. Briefly, sample quality was examined by comparing the number of expressed probes (p#0.05), mean expression across expressed probes, and mean correlation (across expressed probes) with other samples, and 1,243 samples were deemed to give high quality expression profiles. Transcripts with significant expression at a false discovery rate (FDR) #0.05 were identified using a onesided binomial test (based on counts of samples with successful and unsuccessful detection at p#0.05), yielding 20,634 significantly detected probes. Subsequently, we performed ''background noise correction'', log2 transformation, and quantile normalization.
We then tested whether SNPs that were significantly associated with EBNA-1 antibody measurements were also significantly associated with the quantitative expression levels of neighboring transcripts (i.e., whether the candidate SNPs are putative cisregulatory variants), and whether those transcripts were in turn associated with EBNA-1 antibody measures. Prior to these analyses, in order to detect and remove the impact of suspected as well as unknown confounding variables on expression levels, we performed principal components (PC) analysis (after inverse, rankbased normalization of transcripts) on the expression profile data (details on methodology are being prepared for publication elsewhere). For detection of putative cis-acting expression quantitative trait nucleotides, the top 50 expression PCs were regressed out, followed by additive measured-genotype-based association analysis (conditional on linkage) on transcripts in the HLA region. Before correlating expression levels with anti-EBNA-1 traits, we examined (by regression analysis) the relationship between each of the top 50 expression PCs to the antibody traits, and regressed out all of the top 50 PCs except those that were significantly related to the antibody traits (so that we would not accidentally remove any true connection between expression and antibody traits).  Table 2 and Table 3). Red indicates highly correlated SNPs. (TIF)

Supporting Information
Table S1 Information on pedigree relationships. Included are participants in the San Antonio Family Heart Study (SAFHS) and San Antonio Family Diabetes/Gallbladder Study (SAFDGS).

(DOCX)
Table S2 Genome-wide joint linkage and association analysis. Shown are all SNPs yielding genome-wide significant p-values with either the quantitative and/or the qualitative antibody phenotype in the SAFHS. The regression coefficients refer to the estimated change in the phenotype for each dose of the rarer SNP allele. For the SAFHS all genome-wide significant results (p#5.29610 28 ) are presented in bold lettering. After correcting for multiple testing during replication in the SAFDGS (we tested the entire HLA region, with 5689 available SNPs: p#0.05/5689<8.79610 26 ), 10 SNPs are significant for the replicate sample. When using the combined sample of both studies (SAFHS+SAFDGS), all SNPs originally significant in the SAFHS discovery sample are highly significant. (DOCX)

Table S3
Association, conditional on linkage, results for SNPtranscript pairs in the HLA region. Focus is on the 41 SNPs previously found to be significantly associated with EBNA-1 traits (presented in Table 2 and Table 3). Shown are results for the top 74 pairs (p#1.0610 22 ). Only four SNP-transcript pairs were significant after adjusting for multiple testing. (DOCX)