Incidence rates of narcolepsy diagnoses in Taiwan, Canada, and Europe: The use of statistical simulation to evaluate methods for the rapid assessment of potential safety issues on a population level in the SOMNIA study

Background & objectives Vaccine safety signals require investigation, which may be done rapidly at the population level using ecological studies, before embarking on hypothesis-testing studies. Incidence rates were used to assess a signal of narcolepsy following AS03-adjuvanted monovalent pandemic H1N1 (pH1N1) influenza vaccination among children and adolescents in Sweden and Finland in 2010. We explored the utility of ecological data to assess incidence of narcolepsy following exposure to pandemic H1N1 virus or vaccination in 10 sites that used different vaccines, adjuvants, and had varying vaccine coverage. Methods We calculated incidence rates of diagnosed narcolepsy for periods defined by influenza virus circulation and vaccination campaign dates, and used Poisson regression to estimate incidence rate ratios (IRRs) comparing the periods during which wild-type virus circulated and after the start of vaccination campaigns vs. the period prior to pH1N1 virus circulation. We used electronic health care data from Sweden, Denmark, the United Kingdom, Canada (3 provinces), Taiwan, Netherlands, and Spain (2 regions) from 2003 to 2013. We investigated interactions between age group and adjuvant in European sites and conducted a simulation study to investigate how vaccine coverage, age, and the interval from onset to diagnosis may impact the ability to detect safety signals. Results Incidence rates of narcolepsy varied by age, continent, and period. Only in Taiwan and Sweden were significant time-period-by-age-group interactions observed. Associations were found for children in Taiwan (following pH1N1 virus circulation) and Sweden (following vaccination). Simulations showed that the individual-level relative risk of narcolepsy was underestimated using ecological methods comparing post- vs. pre-vaccination periods; this effect was attenuated with higher vaccine coverage and a shorter interval from disease onset to diagnosis. Conclusions Ecological methods can be useful for vaccine safety assessment but the results are influenced by diagnostic delay and vaccine coverage. Because ecological methods assess risk at the population level, these methods should be treated as signal-generating methods and drawing conclusions regarding individual-level risk should be avoided.


Introduction
In August 2010, a safety signal of narcolepsy following AS03-adjuvanted pdm(09)H1N1 influenza vaccine Pandemrix1 was reported in Finland and Sweden among children and adolescents [1]. Other rapid risk assessment studies conducted in the European Union (EU) did not show changes in incidence rates of narcolepsy diagnoses, except in Finland, Sweden, and Norway [2], all countries that achieved high coverage rates with Pandemrix. Subsequent hypothesis-testing studies showed associations; these had high within-and between-study variation [3]. In China, where vaccine coverage was very low, a 3-fold increase in narcolepsy onset was reported following the peak of the pandemic [4].
Narcolepsy is a rare disease with a long interval from onset of symptoms to diagnosis, especially in adults. Several possible explanations for the purported pdm(09)H1N1 and narcolepsy link have been proposed but none confirmed. Hypotheses range from a causal effect of the AS03 adjuvant, the manufacturing process, presence of nucleoproteins in Pandemrix, and molecular mimicry, to awareness and assessment biases, and residual confounding [5][6][7][8][9][10]. Based upon simulation studies conducted by Wijnans et al., in the absence of a causal association but in the presence of accelerated diagnosis due to awareness, we would expect to see an increased incidence of narcolepsy diagnosis following awareness of the association followed by a decrease, even to levels below the background incidence, due to depletion of cases [8]. This effect may be particularly important in conditions with a long delay to diagnosis such as in narcolepsy where the delay in diagnosis from initial symptoms can be 10-20 years [11]. The SOMNIA (Systematic Observational Method for Narcolepsy and Influenza Immunization Assessment) study was funded by the US Centers for Disease Control and Prevention (CDC) and used information from countries that used different types of adjuvanted pandemic influenza vaccines to assess whether the pdm(09)H1N1 influenza vaccine and specifically the MF59 and AS03 adjuvants were associated with narcolepsy.
One of the goals of SOMNIA was to assess patterns of incidence rates of narcolepsy in multiple geographic areas and to understand changes in incidence rates of narcolepsy diagnoses before, during, and after the pdm(09)H1N1 influenza pandemic by using electronic health care data, which may be rapidly available. In this paper, we explore whether assessment of safety signals based on ecological methods and population-based electronic health care data are suitable for vaccine safety risk assessment, by exploiting the heterogeneity in vaccine coverage, types of vaccines, and vaccination programs across countries. We assess what strength of signals can be detected using population-level data collected before and after a hypothetical targeted vaccination campaign.
Ecological studies can be defined as those that measure exposure and outcomes at the group level rather than at the individual level [12,13]. In such a study, groups are defined by a naturally occurring difference in space or time such as a change in the vaccination schedule [14] or the beginning and end of a targeted vaccination campaign [15].
This study may serve as an example of the utility of ecological methods to assess vaccine safety signals, particularly regarding events with long onset-to-diagnosis intervals.

Study population and follow-up
For data sources in which individual linkage can be made between population and diagnoses (all sites except Sweden and British Columbia, Canada), the study population comprised all individuals registered within each of the databases during the study period. Observation time began on the date of first registration of an individual in the database, the start of the study period (January 2003), or the start date of data collection for the database, whichever was the latest and ended on the date of death, the date registration was terminated, the end of data collection, or the end of the study period (December 2013), whichever was the earliest. Sweden and British Columbia, Canada used census data to calculate person-time denominators. We used a harmonized approach in which databases locally extracted their data into simple input files in a common format that could be locally analyzed and aggregated using SAS or JAVAbased software [2,19].

Case identification and validation
Cases were persons with a new diagnosis of narcolepsy with or without cataplexy. Validation of the diagnostic codes using patient discharge letters and medical records was conducted in Competing interests: MGS, as an employee of IDIAP Jordi Gol, worked on projects funded by pharmaceutical companies unrelated to this study. RM, as an employee of IDIAP Jordi Gol, worked on the GP databases in the Netherlands and Valencia, Spain. For these two sites, only validated cases were used in the analysis. The other sites used algorithms combining diagnostic codes for narcolepsy with claims for multiple sleep latency tests (MSLTs) to reduce the false positive rate. The same method was used at each site over the entire time period. No further validation was done in other sites (S1 Table).

Analysis
To investigate the purported narcolepsy-pandemic vaccine effect, incidence rates of narcolepsy diagnosis were calculated by calendar year and month and also categorized into three periods based on specific circulation/vaccination periods in each country: 1) pre-pandemic (from January 2003 until the start of the period of pH1N1 circulation); 2) during pH1N1 wild-type virus circulation until the start of the country's pH1N1 vaccination campaign; and 3) from the start of the pH1N1 influenza vaccination campaign through the end of the study (S1 Table). Pandemic H1N1 virus circulation was defined as the period during which weekly influenza test positivity for pH1N1 infection exceeded 10%. Dynamic age groups were categorized as <5 years, 5-19 years, 20-59 years, and �60 years at the time of diagnosis. This categorization was motivated by differences in diagnosis for each age group, and particularly the challenges of differential diagnosis in young children and the elderly [20,21]. Incidence rates of narcolepsy diagnoses were calculated by dividing the number of narcolepsy cases by the accumulated person-time. Ninety-five percent confidence intervals (CIs) were calculated assuming a negative binomial distribution. Following confirmation of homogeneity in incidence rates among databases within the same country, further analyses were conducted at the level of the country rather than the site. Within each country, we estimated incidence rate ratios (IRRs) and 95% CIs for each time period using Poisson regression, with the pre-circulation period as a reference. We included terms for age strata, time periods, and an age � time period interaction using time periods as defined by pH1N1 circulation and vaccination campaign dates.
We conducted additional analyses restricted to European countries to estimate the impact of vaccine coverage and adjuvant among children and adolescents and separately among adults. For this analysis, a composite variable summarizing vaccine coverage classified as low (<20%) or high (�20%) and adjuvant (MF59 or AS03) was created, and incidence in the period after vaccination had started was compared to the pre-pH1N1 circulation period. Because the composite adjuvant/coverage variable was collinear with database and country, neither database nor country was included in the European model.

Simulation
To better understand the utility of ecological methods for assessing vaccine safety signals and whether an association in one age group may be masked in a population-level analysis, we conducted statistical simulations. Each set consisted of 10,000 subjects aged 0 to 100 years, who were observed in the period January 1, 2003 to December 31, 2012. The period of the vaccination campaign was set from October to December 2009. In each simulated data set, vaccine coverage during this period was set at one out of 9 different values between 1 and 99 percent (see Table 1). Subjects assigned as vaccinated got a vaccination date during the campaign period. Baseline narcolepsy incidence was simulated based on reported estimates of incidence by decade of age [22].For vaccinated persons aged <20 years, during the six months following vaccination, the risk for narcolepsy onset was simulated using a relative risk varying from 0.5 to 10 compared to the baseline incidence. In subjects aged �20 years no increased risk was applied. The median time from onset to diagnosis was initially set at 4 years for adults and at 1.5 years for children (aged �18 years) based on SOMNIA data (not shown). The effect of the length of the interval from onset to diagnosis was also tested by varying a reduction rate parameter, with 0 removing the interval (i.e. immediate diagnosis following onset), 0.5 halving the interval, and 1 retaining the full simulated interval (see Table 1 for description of simulation parameters). Finally, in part of the sets the full 10 years was used as study period, in the others it was restricted to 6 months after the end of the vaccination campaign (7.5 years). For each set of simulation parameters, 500 replications were run. The population-level incidence rate ratio for the period following the vaccination campaign vs. the period prior to the vaccination campaign was estimated using Poisson regression. The median incidence rate ratio from the 500 replications was calculated overall and by age group, and plotted against vaccine coverage. The percentage bias was calculated by subtracting the simulated relative risk from the estimated IRR and dividing by the simulated IRR.

Calibration
Using the simulation results, a model was constructed to determine how the underlying individual-level relative risk was related to the median estimated IRR and the appointed vaccine coverage and interval reduction, and the interaction of these terms (Model: Simulated individual-level true RR = Estimated IRR + Simulated vaccine coverage + Simulated onset-to-diagnosis reduction + Interactions). This model was then applied to results obtained from the observed data, using the IRR found in the SOMNIA study and the relevant vaccine coverage and diagnostic delay (S1 Table), in order to calculate the underlying relative risks in 5-to 19-year olds, that would have been necessary to produce this IRR in the absence of other sources of bias. This calibration was restricted to the 5-19 year age group as this group was the source of the safety signal and the only subjects for whom an increased risk was simulated.
This study was conducted under the principles of the Helsinki declaration and each site was responsible for obtaining appropriate ethical approvals. Columbia, Canada and Denmark conducted the study under rolling approval and study-specific approval was not sought.

Observed incidence
Incidence rates of narcolepsy diagnoses ranged from 0.22 to 1.52 per 100,000 person-years by site ( Table 2). Incidence rates in databases within the same country (Canada and Spain) were similar so for further analysis country-specific data were pooled. Due to very low rates observed among the very young (<5 years) and the elderly (�60 years) (S2 Table) and known difficulties in diagnosis in these age groups, these were not included in further analyses. In Fig 1, IRs Table 3). In Sweden, where AS03-adjuvanted Pandemrix vaccine coverage was 60%, in the period after vaccination incidence rates among those aged 5-19 years and 20-59 years were elevated [IRR = 9.01 (95%CI 6.89, 11.80) and IRR = 1.69 (95%CI 1.46, 1.95)], respectively (Fig 1 &  Table 3). None of the other countries showed significant time-period-by-age-group interactions.
In the analysis restricted to Europe and including a vaccine coverage/adjuvant composite variable, IRRs were elevated in the period following start of vaccination in the high-coverage AS03 (Sweden) and low-coverage AS03 groups for children and adolescents (S3 Table). In adults, an elevated incidence in the period following vaccination was detected in the AS03 high-coverage group, which was limited to Sweden. In this analysis, no changes in the incidence of narcolepsy in the post-vaccination period were seen in sites using MF59-adjuvanted vaccine, all of which had low coverage (S3 Table).

Simulation
The simulation study showed that in an analysis such as the one described above, the true RR is consistently underestimated when it is greater than one and overestimated when less than one (Fig 2). Underestimation of true relative risks greater than one is attenuated as vaccination coverage increases but the estimate remains about 6%-26% too low even with vaccination coverage as high as 99% with no delay from onset to diagnosis and a 10-year observation period (Fig 3). As the interval from onset to diagnosis increases, the estimated relative risk deviates further from the true relative risk; performance is improved if the observation time captures only the period of increased risk (data not shown). When the time from onset to diagnosis was not reduced no increased risk was detected for any set of simulation parameters (data not shown). Stratification by age group was effective in elucidating the group that was the source of the increased risk.

Discussion
Evaluation of incidence rates on a population level can be done relatively quickly in countries/ regions with accessible population-based electronic health care databases. This is useful for assessing potential vaccine safety signals. In order to calculate rates quickly in a standardized manner, harmonization of data into simple input files in a common format allowed for the pooling and sharing of data across three continents. The method was capable of identifying the signal in Sweden in 5-to 19-year olds. In the analysis by country, elevated rates of narcolepsy were only detected in Taiwan during wild-type virus circulation through the period following vaccination with MF59-adjuvanted and non-adjuvanted vaccines, and in Sweden following vaccination with AS03-adjuvanted vaccines. The finding in Taiwan may be due to circulation of wild-type influenza virus prior to the start of the vaccination campaign [17]. This is consistent with the finding of a 3-fold increase in narcolepsy onset in China following the peak of the pH1N1 pandemic in a population with very low vaccine coverage [4]. Taiwan vaccinated children aged <1 year with IRR estimates in simulated data, immediate diagnosis (interval reduction rate = 0). Population-level incidence rate ratio estimated from simulated data with observation time equal to 3625 days and true individual-level relative risk equal to .05, 1, 2, 5, or 10 (columns). Gray horizontal reference lines represent the true simulated individual-level relative risk of narcolepsy diagnosis. The interval reduction rate parameter is set equal to zero (meaning immediate diagnosis following onset of symptoms). Vaccination coverage increases within each column along the x-axis. Colored lines represent age group-specific IRRs as noted in the legend and colored bands represent associated 95% confidence intervals. https://doi.org/10.1371/journal.pone.0204799.g002 Incidence rates of narcolepsy diagnoses in Taiwan, Canada, and Europe MF59-adjuvanted vaccine and adults and school children with mainly non-adjuvanted vaccine. In Sweden, where the signal of a narcolepsy safety concern was originally detected [23] and where patients diagnosed with narcolepsy are being compensated [7], rates were much higher than in the other countries. This could be due to differential reporting due to increased awareness of the putative association, a true causal effect in this population with this vaccine,  or some combination of these factors. As shown in simulations, reduction in the time from onset to diagnosis due to awareness of an association can lead to artificial inflations in risk estimates [8]. In Canadian provinces, with around 40% vaccine coverage of a different AS03-adjuvanted vaccine (Arepanrix TM ), no effect was seen in any of the age groups or periods. This study, which is by necessity observational, has several limitations. Data were collected according to a shared protocol but using locally derived algorithms, which may have led to differences in sensitivity and specificity. Case validation in some sites revealed low specificity of the original extraction, which may be the case in other sites as well. In our analysis by adjuvant and vaccine coverage, high coverage with AS03-containing vaccine was only present in Sweden, making Sweden and this adjuvant/coverage group collinear. This makes it impossible to determine whether we are seeing the effects of the vaccine itself or of the reporting and detection patterns in each country. Additionally, the manufacturing process of Arepanrix TM differed from that of Pandemrix TM , leading to vaccines containing different quantities of influenza virus components [24]. The potential effects of these differences in manufacturing cannot be differentiated from adjuvant specific-effects or from other country-specific effects using an ecological design such as the one presented here. Similarly, the countries in which MF59 was used were the same countries in which case validation was conducted. This limits comparability between these countries and others and, therefore, between MF59-containing vaccines and other pandemic vaccines. Differences in case ascertainment could also have impacted our estimates. For example, it has been noted elsewhere that the safety signal originated in Sweden and that this, together with compensation for cases, may have impacted diagnosis patterns [6,7]. Additionally, due to the healthcare system in Taiwan, children complaining of excessive daytime sleepiness are seen by a specialist quickly, making the interval from onset to diagnosis for these children shorter. A median time from symptom onset to MSLT referral of 60 days has been reported for pediatric narcolepsy cases in Taiwan [25]. While it is not possible to rule out a causal association, it is important to note that these factors undoubtedly contributed to the estimates obtained in this study. Differences in the prevalence of the underlying HLA-DQB1 � 06:02 risk allele for narcolepsy, which has been reported to vary widely by country, may affect the incidence at the population level but is unlikely to have affected relative risk estimates [26][27][28][29][30].
Ecological methods, when applied to assessment of a signal association with a targeted vaccination campaign and a disease with a potentially long interval from onset to diagnosis, can provide an unbiased estimate of vaccine-associated risk in a very limited set of circumstances. Obtaining an unbiased estimate in the absence of an association is possible even with very low vaccine coverage and a long onset to diagnosis interval. However, in the presence of a true vaccine-associated risk, all estimates will be biased toward one; this bias is reduced when cases are detected quickly and vaccination coverage is high. Based upon simulations, the estimates we obtained in the current study appear to be underestimations of true relative risks greater than one. However, the simulations did not take into account the possibility of increased reporting due to an awareness of the association, which has been shown in previous simulations to inflate risk estimates [8].
The predicted underlying individual-level relative risk obtained using models derived from simulated data, given the low vaccine coverage attained in most sites, are remarkably high. It is very unlikely that the true relative risk in Sweden, for example, is 36-fold. Calibrated estimates for The United Kingdom, however, are only slightly lower than those reported by Miller et al in their study of narcolepsy following Pandemrix TM vaccination in children aged 4-18 [31]. In general, these predicted relative risks are not in line with results found in the case-control study conducted within SOMNIA, in which no increased risk following pH1N1 vaccines was detected [32,33]. It is important to note that while the case-control data in the SOMNIA study did not include any pediatric cases exposed to Pandemrix TM , the case-coverage sub-study of pediatric cases in The Netherlands did not find an association with Pandemrix TM [33]. These inconsistencies may be an illustration of the ecological fallacy, namely that associations detected at the population level may not be causal at the individual level [34]. In fact, as coverage in our simulations approached 100%, our population-level analysis also approached an individual-level analysis with accurate exposure data for all subjects, explaining why increased vaccine coverage in simulations leads to more accurate estimates of the simulated relative risk.
Previous simulations have shown that reduction in the time from onset to diagnosis following awareness of an association increase risk estimates [8]. Our simulation did not take into account factors that may have changed over the course of the study period such as awareness of narcolepsy and of the pH1N1-narcolepsy association as well as changes in diagnostic and coding practices. Each of these likely contributed to the IRR estimates we obtained. What our simulations do show is that in the absence of factors that increase case detection in the postexposure period, detection of increased population-level risk of a disease with a long onset to diagnosis interval using ecological methods requires an extreme underlying individual-level risk.
The ecological approach fails to detect any increased risk unless the time from onset to diagnosis is short and both coverage and the true relative risk are high. Because of this, we recommend that population-level methods be used in assessment of outcomes with a delay from onset to diagnosis only to generate hypotheses or to strengthen signals when population-level exposure is high. Analysis of the full population when increased risk is only present in one age stratum performs as well as stratified analysis in terms of the magnitude of risk detected, but fails to identify the source of the increased risk. Therefore, if increased risk is suspected in a subset of the population, analyses should be stratified.

Conclusions
Ecological methods can be useful in assessment of vaccine safety but it is important for investigators to understand the impacts of masking by strata not at risk, patterns of onset and diagnosis, and vaccine coverage. What appears to be an estimate of no effect could be valid or, as shown in our simulations, could be an underestimation. 10. https://www.popdata.bc.ca/data. (DOCX) S2 Table. Incidence Rates and Incidence Rate Ratios by continent, country, age, and period. � Periods are as follows: Pre-Circulation = January 2003-the beginning of wild-type H1N1 circulation (defined per country); Circulation = Period from the beginning of wild-type H1N1 circulation until the start of the vaccination campaign (defined per country); Vaccination & Post = Period from the beginning of the vaccination campaign through December 2013. † IRR comparing the period to the pre-circulation period, within the age group ‡The count has been suppressed either because (1) the observed number of events is very small (n � 2) and not appropriate for publication; or (2) it could be used to calculate the number in a cell that has been suppressed. (DOCX) S3