Primary Epstein-Barr virus infection with and without infectious mononucleosis

Background Infectious mononucleosis (IM) is a common adverse presentation of primary infection with Epstein-Barr virus (EBV) in adolescence and later, but is rarely recognized in early childhood where primary EBV infection commonly occurs. It is not known what triggers IM, and also not why IM risk upon primary EBV infection (IM attack rate) seemingly varies between children and adolescents. IM symptoms may be severe and persist for a long time. IM also markedly elevates the risk of Hodgkin lymphoma and multiple sclerosis for unknown reasons. The way IM occurrence depends on age and sex is incompletely described and hard to interpret etiologically, because it depends on three quantities that are not readily observable: the prevalence of EBV-naϊve persons, the hazard rate of seroconverting and the attack rate, i.e. the fraction of primary EBV infections that is accompanied by IM. We therefore aimed to provide these quantities indirectly, to obtain epidemiologically interpretable measures of the dynamics of IM occurrence to provide etiological clues. Methods and findings We used joint modeling of EBV prevalence and IM occurrence data to provide detailed sex- and age-specific EBV infection rates and IM attack rates and derivatives thereof for a target population of all Danes age 0–29 years in 2006–2011. We demonstrate for the first time that IM attack rates increase dramatically rather precisely in conjunction to typical ages of puberty onset. The shape of the seroconversion hazard rate for children and teenagers confirmed a priori expectations and underlined the importance of what happens at age 0–2 years. The cumulative risk of IM before age 30 years was 13.3% for males and 22.4% for females. IM is likely to become more common through delaying EBV infection in years to come. Conclusions The change in attack rate at typical ages of puberty onset suggests that the immunologic response to EBV drastically changes over a relatively short age-span. We speculate that these changes are an integrated part of normal sexual maturation. Our findings may inform further etiologic research into EBV-related diseases and vaccine design. Our methodology is applicable to the epidemiological study of any infectious agent that establishes a persistent infection in the host and the sequelae thereof.


Methods and findings
We used joint modeling of EBV prevalence and IM occurrence data to provide detailed sexand age-specific EBV infection rates and IM attack rates and derivatives thereof for a target population of all Danes age 0-29 years in 2006-2011. We demonstrate for the first time that IM attack rates increase dramatically rather precisely in conjunction to typical ages of puberty onset. The shape of the seroconversion hazard rate for children and teenagers confirmed a priori expectations and underlined the importance of what happens at age 0-2 years. The cumulative risk of IM before age 30 years was 13

Introduction
Most people are infected with Epstein-Barr virus (EBV) during childhood or adolescence, resulting in a persistent, mostly latent EBV infection. The primary EBV infection often manifests as infectious mononucleosis (IM), especially in adolescence [1,2]. Globally EBV is causally linked to nearly 200000 incident cancers and 18000 deaths from multiple sclerosis annually [3,4], with IM elevating the risk of Hodgkin lymphoma and multiple sclerosis for unknown reasons [5][6][7]. Functions of EBV antibody levels as predictors of disease risk is an active field of research, see [8] and references therein. At the same time it is unclear why upon primary EBV infection some individuals present with IM, while others do not [9]. Disease severity and duration correlate much better with e.g. CD8+ cell counts than with the viral kinetics itself and the expansion of the CD8+ cell count is controlled in asymptomatic EBV infection despite virus loads similar to those experienced in symptomatic EBV infection [1,2,[10][11][12][13][14]. Hence current understanding suggests that IM is caused by overreaction by the immune system, rather than EBV infection per se (viremia or Bcell expansion). There now seems to be broad agreement that a massive expansion of the number of EBV-specific CD8+ cells is a characteristic of IM, while changes in the proportions of other cell populations seem less well-established [1,2,[10][11][12][13]15,16]. Clinically, IM is typically characterized by fever, pharyngitis, lymphadenopathy and fatigue. The IM symptoms are believed to be caused mostly, if not entirely by the exaggerated CD8+ response [2,15,16]. Presumably IM is the same disease in teenagers as in children, because the immunological response to EBV infection is recognizably the same [10,17].
The way IM occurrence depends on age and sex is incompletely described and hard to interpret etiologically. The age distribution of incident IM is dominated by a distinct peak in the middle of the teenage years [18,19]. However, as an etiological clue this is not particularly useful because the depicted rate is not really a rate, i.e. a number of IM cases divided by the time at risk of those who have not seroconverted. Rather it is a product of the prevalence of EBV-nave persons, the hazard rate of seroconverting and the attack rate, i.e. the fraction of primary EBV infections that is accompanied by IM. Attack rates have only been estimated in young adults [20][21][22][23], and estimated sero-conversion rates are practically non-existent too.
It would therefore be valuable to devise and fit a mathematically coherent model, projecting what would be the age-and sex-specific seroconversion rate and attack rate in a hypothetical population where the observed age-and sex-specific EBV-prevalence and IM occurrence in the target population apply. Such a model could quantify e.g. how much of the IM teenage peak is due to changed behavior (changing hazard of seroconversion), and how much to changed susceptibility to IM (changing attack rate) in teenagers compared with preadolescents.
As proof of concept we therefore fitted such a model based on a few large data sets, with Danes age 0-29 years in 2006-2011 as our target population.

Materials
We used the Danish Civil Registration System [24] to follow-up persons while resident in Denmark in 2006-2011 and of age 0-29 years for incident IM in NPR. Incident IM in the Danish National Patient Register (NPR) [25] for a person was defined as the first hospital contact with IM as main, secondary, or underlying diagnosis, classified as code 075 � in ICD-8 and code B27 � in ICD-10 [19,26].
In 2010 and 2011 the Danish Blood Donor Study (DBDS) [27] asked participants:"Were you ever told by a doctor that you had infectious mononucleosis" and if so-"At what age?". The study base of IM cases from DBDS was defined as DBDS participants who either 1) had reported IM in 2006-2011 at age 0-29 years or 2) had reported IM at age 0-14 years or 3) were IM cases in NPR at age 0-29 years at time of DBDS interview and born in 1976+. We then searched for these persons as IM cases in the NPR. Criterion 2) ensures that we can estimate IM hospitalization rates at age 0-17 years (donors must be 18+ years at interview) and criterion 3) makes a hospital diagnosis of IM equally valid as one proclaimed by a general practitioner.
EBV test results recorded in the Laboratory Information Management System of Statens Serum Institut, Copenhagen, Denmark were mapped into positive and negative results as in Rostgaard et al [19]. The first result was retrieved for all persons serologically tested for primary EBV infection from January 2005 to May 2011. The serological test was based on measurements of IgG antibody titers to EBV nuclear antigen (EBNA) and IgG and IgM antibody titers to EBV capsid antigen (VCA). All measurements were performed using enzyme-linked immunosorbent assay (Biotest, BioNordika, Herlev, Denmark). Each test result was coded as 1)"prior infection" if EBNA was positive, 2)"positive" if VCA IgG or VCA IgM were"positive" or"weak" while at the same time EBNA was"negative" or"weak" and 3)"negative" if VCA IgG, VCA IgM and EBNA were all"negative". Any test result that did not match any of these three disjoint criteria was discarded [19]. The test results were obtained from analyzing test samples sent from hospitals and general practitioners from all parts of Denmark, but predominantly from Sealand [19].
For convenience we used a discrete survival model and hence lumped age into 61-day intervals denoted by a = 0,1,2,. . .,179, a = 180 � 30 years. The data were aggregated accordingly. In order to obey a data discretionary rule of at least 5 observations in a cell, the data were sorted by type, sex and age and cells then aggregated on a running basis to fulfil this criterion, the age interval denoted by the rounded mean of a. The data were used in that form and are available in S1 Data. We did not use data from the first year of life due to the maternally-derived EBV sero-positivity shortly after birth [17,28].

Statistical methods
The statistical framework for this paper is a Markov model with the following states (see pages 1-25 and 457-475 in [29]): 0: EBV negative 1: EBV positive, no history of IM 2: EBV positive, a history of IM We describe the dynamics of the system only in terms of age a and sex s. Let S(a,s) be the sex-and age-specific probability of being in state 0. Let the probability of moving from state 0 to state 1 or state 2 be f 1 (a,s) and f 2 (a,s), respectively. These are expressed in terms of the probability of being at risk in state 0, S(a-1,s), the probability of moving out of state 0 if in that state, (1+exp(-ε s (a))) -1 and the probability of presenting with IM upon seroconversion, P(a,s) = (1 +exp(-ϊ s (a))) -1 , i.e. f 1 (a,s) = (1-P(a,s))(1+exp(-ε s (a))) -1 S(a-1,s) and f 2 (a,s) = P(a,s)(1+exp (-ε s (a))) -1 S(a-1,s). Let the probability of hospitalized IM among IM cases in DBDS be P(a,s) = (1+exp(-ν s (a))) -1 . Let p0, p1 and p2 be shorthand for the probability of being in state 0, 1, and 2. Similarly let imfrac and hospfrac be shorthand for the probability of IM upon seroconversion and hospitalization upon having IM.
The model was fitted using SAS proc HPNLMOD. The functions ε s (a), and ν s (a) were modeled as fractional polynomials of degree 4 and 2, with power sets (-1,0,0.5,1) and (2,1), respectively (see pages 77-98 in [30]). Thus ν s (a) was a second degree polynomial in a. These fractional polynomials sufficed to provide an adequate fit, according to goodness-of-fit tests and inspection of residuals. Preliminary analyses revealed that ϊ s (a) were complicated functions, requiring 8-12 degrees of freedom for an adequate fit. The functions ϊ s (a) were modeled as restricted cubic splines (see pages 20-24 in [31]). The knots for the splines were common for the sexes and at the outset placed at deciles of the number of IM events in NPR. We then added knots at the 2.5, 5 and 7.5 percentile to obtain a satisfactory fit also in a region with few IM cases but much change in seroconversion rates. imfrac did not look as expected in the tail and very different between the sexes. We considered this to be a consequence of model uncertainty regarding the post-teenage years in combination with the notorious wigglyness of highdimensional splines. To remedy this we therefore removed the two top knots, retaining an adequate model fit according to goodness-of-fit tests. Finally we fixed ϊ s (a) to be constant above the new top knot, at the cost of an increase in deviance of 2.5-3 in each sex in order to remove unrealistic decreasing trends above the top knot.
The link between model and data was provided by the following contributions to the model log-likelihood (ll): for EBV prevalence data with POS positives among N tested: ll = POS � log(p1+p2)+(N-POS) � log(p0) for DBDS data with POS hospitalized among N IM cases: ll = POS � log(hospfrac)+(N-POS) � log(1-hospfrac) for NPR data with EVENTS IM cases in PYRS person-years at risk: The construction of most of the graphs in

Assumptions
We assume that all persons start in state 0 at birth, i.e. we ignore that EBV can pass across the placenta during pregnancy [32]. Death, emigration etc is considered non-informative censoring. The incubation time of around 42 days [12] from EBV infection to possibly overt IM is ignored. Since they are few, and not directly identifiable, we have not created a special state for persons who will remain EBV-negative [33,34], e.g. due to lack of the EBV receptor CD21 on B-cells [33]. Similarly, states 1 and 2 are absorbing, so we do not allow alternation between susceptible and non-susceptible states, suggested as possible by Helminen et al. [34], nor do we allow multiple EBV infections where the first did not cause IM, but one of the later did, i.e. we assume that once a latent EBV infection is established you cannot get IM caused by EBV. We assume that a person can have IM only once, e.g. that a person cannot have a second IM caused by e.g. cytomegalovirus. The data on IM incidence will usually be exaggerated due to lack of proper laboratory confirmation of EBV involvement in IM-like disease symptoms. Part of the problem is that only 90% of true IM is caused by EBV [35], that is the 0.9 in the expression for 'him' above.

Miscellanea
The risk of getting IM before age 30 years was calculated as 1-exp(-H(a)), where H(a) is the cumulative population IM incidence rate at age a, i.e. the integral of the curves shown in Fig  1F. Estimates and confidence limits as presented in the figures were calculated from the predict logic of SAS proc HPNLMOD. In these calculations the leading coefficient of ν s (a) was fixed to avoid inexplicable variance inflation in Fig 1H. The variance estimates in the other graphs were essentially unaltered by this fix.
All statistical calculations were performed using SAS statistical software (SAS Institute, Cary, NC. version 9.4).
The study was approved by the institutional review board of Statens Serum Institut and the Scientific Ethics Committee Central Denmark (M2009237). As such it adheres to Danish law, including the European Union General Data Protection Regulation and is conducted according to the principles expressed in the Declaration of Helsinki. Written informed consent was obtained at enrollment into the DBDS [27], while specific informed consent for use of the other (register) data sources in this study was not needed according to Danish law.

Results
All Danes age 0-29 years resident in Denmark somewhen during calendar years 2006-2011, in all 2,485,292 persons, were followed up in the same age and period range for a hospital contact with an IM diagnosis during 11,376,713 person-years of follow-up, yielding 4703 incidents of hospitalized IM. 2487 blood donors from The Danish Blood Donor Study, who had selfreported IM or had been hospitalized with IM under the right conditions (age and period, see Methods) were assessed for hospitalized IM to yield the fraction of hospitalized IM among IM cases (185/2487 = 7% of IM cases). 6145 persons tested for EBV antibodies at Statens Serum Institut at age 0-29 years during calendar years 2006-2011 yielded 3513 (57%) infected with EBV. The three statistically sufficient data sets for these three outcomes and the only data sets used for our analyses are available in S1 Data, labelled in the type variable as NPR, DBDS and EBVPREV, respectively. The results of our modeling are a set of age-and sex-specific predictions, presented in Fig 1A-1H, and the same predictions in a slightly aggregated life-table format in Table 1, with columns labelled B to H. Throughout we shall only refer to the figures, the reader may consult the relevant columns of the table instead.
The EBV prevalence in our data set was generally lower than in older unselected Danish data [36,37], but otherwise similarly distributed (Fig 1A). Sex-specific corresponding proportions of EBV nave individuals are shown in Fig 1B. Both sexes experienced peaks in seroconversion rate as infants and as young adults ( Fig  1C). The seroconversion rates for boys and girls were similar on the left side of the nadir in seroconversion rate, but girls had the highest rate to the right of the nadir (Fig 1C). The seroconversion rate peaked at age 17.2 years in females and at age 17.5 years in males.
The IM attack rate rose from practically nothing in children aged 0-2 years to represent a very common phenomenon in teenagers (Fig 1D). A peak in attack rate appeared in teenage years, and was especially pronounced among girls. The attack rate was higher in females than males throughout the teenage years. The attack rate peaked at age 16.3 years in girls and at age 17.3 years in boys and likewise the local minimum in attack rate to the left of the peak occurred at age 11.0 years in boys and at age 10.5 years in girls (Fig 1D).
For all ages the fraction of hospitalized IM cases was larger for males than for females. The fraction of IM cases becoming hospitalized was unimodal and typically low with a minimum of 6% at age 18.3 years and 4% at age 21.8 years for males and females, respectively (Fig 1E). Fig 1F, 1G and 1H contain what we call population rates. The denominator in these rates is time at risk for the entire population, not just the subpopulation of EBV nave.
The IM population hazard rate is the product of the seroconversion population hazard rate and the attack rate. The location and shape of the IM population hazard rate peak in teenage years (Fig 1F) was essentially determined by the attack rate (Fig 1D), which varied considerably more in this age span than the seroconversion population hazard rate (Fig 1G).
The combination of information in Fig 1C and 1D revealed several things. For children age 0-2 years the attack rate was low and the seroconversion rate high, as a priori expected from prevalence and rate data. For 3-12 years old children the IM population hazard rate was kept low mainly by a small seroconversion rate, since the attack rate, relatively speaking increased substantially compared to the attack rate in 0-2 years old children. Comparing children age 4 to 5 years (the nadir of seroconversion) with teenagers age 16 to 17 years (the peak in IM attack rate) the seroconversion rate was lower by a factor of 6 to 8, while the corresponding attack rate was lower by a factor of 6 to 10. Accordingly, the low incidence of IM in 3-12 years old children was roughly equally due to low attack rate and low seroconversion rate.

Discussion
Our analyses for the first time provide detailed and compelling evidence that the accumulation of IM among adolescents, a characteristic of western industrialized countries, reflects agedependent variations both in IM attack rates and EBV seroconversion hazard rates. Both early childhood and adolescence are age-periods characterized by social behaviors involving the exchange of saliva, the primary route for EBV transmission, e.g. through sharing of toys and utensils in early childhood and through kissing in adolescence and early adulthood [38]. Deep kissing as the main route of EBV transmission in adolescence and beyond is well established [9], while the evidence for sharing of toys and utensils in early childhood as an important route of EBV transmission is weaker and indirect, e.g. a marked reduction in IM risk for each additional sibling, especially when the age-differential is small [19,39], presumably due to pre-teenage EBV infection.
To become infected, EBV nave individuals must interact with EBV-positive infectious individuals. Consequently, spreading of EBV depends on patterns of interaction between EBV-susceptible and EBV-infectious individuals and the likelihood of EBV transmission and infection at such encounters. In early childhood when the vast majority of individuals are still EBVnave, EBV will spread rapidly because many encounters between these EBV-nave children and EBV-positive parents/adults/same age children has the potential to create an EBV infection in the EBV-nave child.
The steep decline in seroconversion hazard rates between ages 2 and 5 years is disproportionate to the decrease in EBV-susceptible individuals. Therefore, rather than the gradual reduction in proportions of susceptible and possibly also acutely infected (infectious) individuals, the decreasing risk of EBV-infection and the plateauing sero-prevalence likely reflect agerelated changes in behavior associated with lower risk of EBV transmission from both other children and parents/adults.
The second wave of EBV infection occurred in adolescence through early adulthood, with the highest EBV hazard rates occurring at slightly younger ages in females than in males. This may reflect earlier puberty in girls than in boys, and the typical age-disparity in female-male relationships with girls tending to partner with older boys [40]. Because of the age-dependent increase in EBV-sero-positivity, girls at any age would tend to engage with boys more likely to be EBV infected than boys their own age, whereas the opposite would be true for boys who would engage with younger girls, less likely to be EBV infected than girls their own age. Although of waning relevance in Western societies we also speculate that girls more often than boys expose themselves as caretakers to siblings and other children of age 0-3 years, whom we have identified as risk factors for IM and thus primary EBV infection [19]. All these interaction patterns would accelerate seroconversion in the female population and decelerate it in the male population.
The shape of the attack rate essentially complies with the dogma of IM being more frequent and severe the older the age at seroconversion [41][42][43] yielding something close to a monotone increase by age (Fig 1D).
The mechanisms underlying the age-dependent variation in IM attack rate have remained elusive, but proposed explanations include corresponding variations in mode and dose of infection and in host immune response [2,11,15,44]. The host immune response may vary by age for at least two reasons: 1) NK cell responses may assume greater importance, and perhaps be more effective, in combating virus infections early in life [15], and/or 2) adolescents infected with EBV may recruit large numbers of cross-reactive memory T cells previously created in response to other viral infections, which may more easily be activated but be less efficient in controlling the infection than primary responses from recruited nave T cells [44].
However, in light of the very rapid change in IM attack rate, we do not consider cross-reactivity of memory T cells to be a likely major contributor to this change. Similarly Balfour et al. found no evidence of influenza-EBV dual specific CD8+ T cells in their study cohort to support this explanation [9,11]. Likewise both simulations [44] and observational studies [2,12,45] suggest that the initial viral load, and hence dose or mode of delivery is of little importance for IM risk.
Our results, including the fact that the adolescent attack rate peak among females occurs at a slightly younger age than the corresponding peak in males, instead point to IM susceptibility as somehow being subject to mechanisms that involve growth and/or sex hormones whose levels change as part of sexual maturation. In this regard, it is noteworthy that both estrogen and androgens are known to influence immune responses via epigenetic mechanisms, see [46] and references therein.

Strengths and weaknesses
We believe the serological data on prevalent EBV status to be accurate. They are based on enzyme-linked immunosorbent assays, which can perform very similarly to the gold standard of immunofluorescence arrays [1,[47][48][49]]. However, the tested patients were not randomly sampled, and as such may yield a biased representation of the age-and sex-specific EBV-prevalence in our population. Specifically, most persons in our sample were presumably tested in order to determine whether symptoms similar to IM could be due to an acute EBV infection. Furthermore, we suspect that many of the samples were sent for serological testing due to atypical IM symptoms or results of a quick but unreliable IM test, that the general practitioner did not trust. As such one would expect to sample too many recently EBV-infected persons. On the other hand, comparison with older unselected Danish data sets suggest, if anything, that we have too few EBV-infected persons in our sample at a given age.
Secular changes, specifically the Danish society becoming more affluent would tend to lower the age-specific sero-prevalence in our material compared to older Danish materials [1,50]. This could explain the discrepancy, and recent examples of such trends in other Western countries exist [11,41]. In Denmark the gradual increase in childcare attendance from around 1965 to 2000 [51] would tend to work in the opposite direction, but the effect is probably modest since the most common type of childcare for children age 0-2 years is by daycare mothers, i.e. caretakers taking care of only a few children. Currently only a third of a generation of children age 0-2 years attends an institution (creche, kindergarten or integrated institution), see {http://statistikbanken.dk}{statistikbanken.dk}.
Altogether, we believe that our estimated seroconversion rates are sufficiently accurate to model the essential seroconversion dynamics in our target population.
For the purpose of attributing causes for IM in different age groups, it seems more important to get the ratios of age-specific attack rates within sexes, rather than the exact level, correct. We see no reason why our data or modeling should be noticeably biased with respect to assessing ratios of age-specific attack rates within sexes. Furthermore, our estimated attack rates around age 20 years are compatible with earlier detailed longitudinal studies on university students and army recruits [20][21][22], and do not de facto become 100% at any age as would be the sign of a severe upwardly biased ascertainment of incident IM.
We believe the variation in the fraction of IM cases hospitalized to be a natural screening phenomenon. Specifically, we believe that general practitioners expect IM symptoms in teenagers to be caused by IM and therefore do not admit such cases to hospital, while the more unexpected and for children more non-specific IM symptoms [52][53][54][55] would cause general practitioners to admit a patient to hospital for further investigation more often. We do not know why the fraction of hospitalized IM cases is higher in boys than in girls; if anything, girls seem on average to have the most vigorous immune response as measured by EBV antibody titers [56][57][58]. Furthermore, there seemingly is no age gradient (age 6-17 years) in EBV antibody titers [58], supporting the view that the bathtub shaped curves are a screening phenomenon, rather than due to physiology.
The cumulative risk of IM before age 30 years was 13.3% for males and 22.4% for females. This estimate is quite high compared to other estimates (� 5% with much variation) (Rostgaard et al. [26] and Table 5 in Hjalgrim [59]). We have no immediate explanation for this. However, we do not consider it surprising to have a substantially larger ''lifetime'' risk of IM in our target population than in other older and less affluent settings referred to above. E.g. the percentage of 15-17 year old EBV nave Americans increased from 22 to 31 over just 6 years ( Table 2 in Balfour et al. [1]), which all else equal should increase the occurrence of IM in that age span a factor 31/22 = 1.41. If the percentage of EBV nave at the IM teenage peak was much lower in the past a change in the occurrence of IM of a factor 3 or 4 is certainly possible. Furthermore the blood donors in our study being on average better educated and wealthier than non-donors [60] would suggest them to be recruited from affluent population strata and as such more prone to late EBV infection and thus presenting with IM than the general population.

Conclusion
Studies to predict the possible benefit of a specific EBV vaccine was one of five priorities outlined at an EBV-vaccine meeting organized by the US National Institutes of Health in 2011 [61]. The present study provides for the first time some of the knowledge needed for that purpose by precisely displaying at what age persons seroconvert and when it has consequences in terms of IM, with all the sequelae that goes with that [3][4][5][6][7].
Mathematically the pair of descriptors (EBV hazard rate, IM attack rate) has the advantage compared with (EBV prevalence, IM incidence) of being more"local" in time, and therefore better suited to generation of causal interpretations and hypotheses, as causal mechanisms work locally in time, i.e. causes continually transmit their effects [62]. We think our study vindicates this point of view.
Methodologically we found it relatively easy to transform prevalence data into mathematically coherent and equivalent forms, primarily smooth hazard functions. We found these more informative than the raw prevalence data for, in this case, the dynamics of EBV infection. Our prevalence data were very detailed regarding age, but usually much cruder data would suffice for obtaining a model-based smooth hazard function. We believe that this type of analysis would be helpful in many future studies of the epidemiology of specific persistent infections.
Supporting information S1 Data. The raw data for model fitting. (TXT)