^{1}

^{1}

^{2}

^{1}

^{1}

^{3}

^{1}

^{4}

^{1}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: MVB DTB MK. Performed the experiments: EDB SI. Analyzed the data: DTB MVB. Contributed reagents/materials/analysis tools: DTB PT. Wrote the paper: DTB JW MVB.

Reliable discrimination of recent influenza A infection from previous exposure using hemagglutination inhibition (HI) or virus neutralization tests is currently not feasible. This is due to low sensitivity of the tests and the interference of antibody responses generated by previous infections. Here we investigate the diagnostic characteristics of a newly developed antibody (HA1) protein microarray using data from cross-sectional serological studies carried out before and after the pandemic of 2009. The data are analysed by mixture models, providing a probabilistic classification of sera (susceptible, prior-exposed, recently infected). Estimated sensitivity and specificity for identifying A/2009 infections are low using HI (66% and 51%), and high when using A/2009 microarray data alone or together with A/1918 microarray data (96% and 95%). As a heuristic, a high A/2009 to A/1918 antibody ratio (>1.05) is indicative of recent infection, while a low ratio is indicative of a pre-existing response, even if the A/2009 titer is high. We conclude that highly sensitive and specific classification of individual sera is possible using the protein microarray, thereby enabling precise estimation of age-specific infection attack rates in the population even if sample sizes are small.

Yearly epidemics of influenza A are the cause of a variable burden of disease that can be substantial in years with high influenza activity

Traditionally, the gold standard for detecting influenza infections is by the use of paired serum samples, the first taken in the acute phase of infection and the other several weeks later. A significant (usually fourfold) increase in antibody titers is subsequently taken as evidence for recent infection. In practice, however, it is both costly and logistically challenging to obtain such samples. Consequently, residual or other one-point serological samples are often used instead, and classification is based on a high antibody titer in the one-point sample. Such classifications, however, may lack in sensitivity, especially when it comes to distinguishing between persons that have been infected recently and persons that have been infected with similar viruses in the past.

Moreover, in comparative studies when multiple antigens need to be tested the traditional tests are laborious, and need a significant amount of serum. Recent studies have made increasing use of novel diagnostic assays based on protein microarrays

In the Netherlands, two serological studies had been conducted before and after the H1N1 pandemic of 2009

The data are analysed using mixture models. In contrast to traditional analyses which use a fixed cut-off value to classify each sample into one class (susceptible, immune, recently infected), mixture models estimate the probability that a sample belongs to one of these classes. Hence, mixture models provide a natural way to include uncertainty in the classification procedure, and also enable investigation of optimal cut-off values

Two age-stratified population based surveys had been conducted in the Netherlands before and after the pandemic of 2009 ^{th} of October 2009, which marks the onset of sustained transmission in the Netherlands.

The aim of the earlier study was to obtain estimates of age-specific infection attack rates, and sera had been analysed with a hemagglutination inhibition test (HI). Most of the samples in the earlier study tested negative using HI. To prevent a random sample being drawn that contains mostly test negative sera, we stratify the sampling procedure by HI titer. One group contains sera that tested negative, one group contains sera with a low to intermediate standardised HI titer (positive but <40; henceforth called intermediate titer), and one group contains all sera with a intermediate to high standardised HI titer (≥40; henceforth called high titer). This procedure stratifies the population by age, (standardised) HI titer, and survey (pre- versus post-pandemic). Two strata contain no data, as all persons aged 5–9 years tested negative in the pre-pandemic sample. For the remaining 28 groups we have drawn a random subset for analysis (

The study was approved by the Medical Ethical Testing Committee of Utrecht University (Utrecht, the Netherlands), according to the Declaration of Helsinki (protocol 66-282/E). Written informed consent was given by participants (or next of kin/caregiver in the case of children) for suitably anonymised clinical records to be used in this study.

The subset of sera from the original study was analysed with a microarray as described earlier

Strain | Subtype |

A/WS/1933 | H1N1 |

A/New Caledonia/20/1999 | H1N1 |

A/Brisbane/59/2007 | H1N1 |

A/Canada/720/2005 | H2N2 |

A/Aichi/2/1968 | H3N2 |

A/Wyoming/2/2003 | H3N2 |

A/Brisbane/10/2007 | H3N2 |

A/Vietnam/1194/2004 | H3N2 |

A/Chicken/Netherlands/1/2003 | H5N1 |

A/Guinea fowl/Hong Kong/WF10/1999 | H7N7 |

We use a mixture model to provide a probabilistic classification of individual samples and estimate age-specific infection attack rates. The mixture model contains three component distributions that model the responses across age groups. The first distribution describes samples of low antibody titer, pertaining to susceptible persons. The second distribution describes samples of intermediate antibody titers and aims to identify persons that have pre-existing antibodies, hereafter named the immune component distribution. The third distribution describes samples of high titer, and aims to identify persons infected during the pandemic.

The susceptible and immune component distributions are fitted to pre- and post-pandemic data, while the infected component distribution is fitted to the post-pandemic data only. We assume that there are no age dependencies in the component distributions, and fit Gaussian distributions to the log_{2} antibody titers. We collect the means (_{sus}, _{imm}, and _{i}_{nf}) and standard deviations (_{sus}, _{imm}, _{inf}) of the distributions in parameter vectors (_{sus}, _{imm}, _{inf}), and denote by

The weights of the distributions are determined by two mixing parameters per age group, _{a}_{a}_{a}_{a}-q_{a}_{a} and the local density of the infected component distribution, normalised by the sum of these quantities over all component distributions (susceptible, immune, infected).

The statistical analyses are based maximization of the log-likelihood. In the following we denote by _{pre} and _{post} the number of samples in the pre- and post-pandemic survey, by _{pre}(_{2} antibody titer of sample

The total log-likelihood is given by the sum of the pre- and post-pandemic log-likelihoods. In practice, the above formulations need to be adapted slightly to account for left-censoring of samples below the detection limit

To investigate whether classification of individual samples can be improved by the inclusion of a second antigen, we extend the univariate mixture model described above to a bivariate mixture model. The analysis of the extended model runs along the same lines as outlined above, the main difference being that the component distributions are now specified not by a single mean and standard deviation, but by two means (e.g., _{imm}). Hence, the equations remain the same, but in this case the parameter vectors contain five instead of two elements.

The mixture models are fitted using Markov Chain Monte Carlo methods. Specifically, we use a random walk metropolis algorithm with normal proposal distributions and the current value as mean

There is a positive overall correlation between HI and the microarray response to A/2009 (Kendall's tau = 0·45, p-value <0·001). The correlation is stronger in the post-pandemic study (tau = 0·57, p-value <0·001) than in the pre-pandemic study (tau = 0·28, p-value <0·001), and is strongest in young children (5-9 years) in the post-pandemic study (tau = 0·77, p-value <0·001). A further comparison shows that 100 out of 357 samples (28%) test negative in HI but have a positive response in the microarray (

Data are stratified by study and age group (5–9, 10–19, 20–44, and 45–65 years). The bottom left corner in each panel shows the number of samples that tested negative in both assays. The top right corner shows Kendall's tau, a nonparametric correlation coefficient.

In young children (5–9 years) there is a perfect distinction between persons that were likely infected, and those that remained susceptible. In fact, in the pre-pandemic study there are no young children with a positive test result in the A/2009 microarray, while 64% of the participants has a titer higher than 65 in the post-pandemic study (

The data are aggregated as follows: <20, 20–40, 40–80, 80–160, 160–320, and 320–640. Grey and red bars represent pre- and post-pandemic data, respectively. The solid and dashed line represent the immune and infected component distribution, respectively. The cumulative probabilty density of the mixtures below the detection limit of 20 are marked with black dots.

A/2009 (H1N1) Microarray Titer | Age Group (years) | ||||

5–9 | 10–19 | 20–44 | 45–64 | ||

Pre-pandemic | >20 | 0 | 0·30 | 0·60 | 0·55 |

>40 | 0 | 0·23 | 0·33 | 0·37 | |

>65 | 0 | 0·13 | 0·22 | 0·30 | |

Post-pandemic | >20 | 0·64 | 0·53 | 0·54 | 0·59 |

>40 | 0·64 | 0·43 | 0·42 | 0·36 | |

>65 | 0·64 | 0·43 | 0·25 | 0·22 | |

>20 | 0·64 | 0·24 | −0·05 | 0·04 | |

Post-Pre | >40 | 0·64 | 0·20 | 0·10 | −0·01 |

>65 | 0·64 | 0·30 | 0·03 | −0·08 |

Subtracting post- and pre-pandemic prevalences yield rough estimates for the age-specific infection attack rates, suggesting that infection attack rates are high in young children (64%) and low in older adults (<4%)(

Estimated Probability (95% CI) | ||||

Age | Component | Univariate microarray | Bivariate microarray | Univariate HI |

5–9 | Susceptible (pre) | 1·00 (0·79;1·00) |
1·00 (0·84;1·00) |
1·00 (0·52;1·00) |

Susceptible (post) | 0·35 (0·12;0·61) | 0·37 (0·06;0·59) | 0·40 (0·01;0·64) | |

Immune | 0·00 (0·00;0·21) |
0·00 (0·00;0·16) |
0·00 (0·00;0·48) | |

Infected | 0·65 (0·31;0·82) | 0·63 (0·36;0·89) | 0·60 (0·25;0·82) | |

10–19 | Susceptible (pre) | 0·63 (0·37;0·82) | 0·44 (0·24;0·66) | 0·76 (0·21;0·98) |

Susceptible (post) | 0·36 (0·12;0·56) | 0·22 (0·03;0·40) | 0·56 (0·02;0·72) | |

Immune | 0·37 (0·18;0·63) | 0·56 (0·34;0·76) | 0·24 (0·02;0·79) | |

Infected | 0·27 (0·11;0·49) | 0·22 (0·08;0·46) | 0·20 (0·11;0·61) | |

20–44 | Susceptible (pre) | 0·30 (0·06;0·43) | 0·07 (0·04;0·20) | 0·89 (0·22;0·97) |

Susceptible (post) | 0·25 (0·02;0·36) | 0·00 (0·00;0·10) | 0·81 (0·09;0·89) | |

Immune | 0·70 (0·57;0·94) | 0·93 (0·80;0·96) | 0·11 (0·03;0·78) | |

Infected | 0·05 (0·00;0·15) | 0·07 (0·03;0·15) | 0·08 (0·02;0·26) | |

45–64 | Susceptible (pre) | 0·22 (0·05;0·4) | 0·12 (0·04;0·24) | 0·91 (0·17;0·97) |

Susceptible (post) | 0·22 (0·02;0·37) | 0·07 (0·00;0·18) | 0·88 (0·07;0·91) | |

Immune | 0·78 (0·60;0·95) | 0·88 (0·76;0·96) | 0·09 (0·03;0·83) | |

Infected | 0·00 (0·00;0·10) |
0·05 (0·00;0·14) | 0·03 (0·01;0·20) |

one-sided confidence interval.

Black and red dots represent the data, i.e. pre- and post-pandemic samples. Contours indicate the densities of the immune and infected distributions weighted by the the estimated age-specific infection probabilities (

We further evaluated the diagnostic characteristics of the microarray by analysing classification of post-pandemic sera. In general, classification is most precise in the bivariate microarray (

Shown are the estimated probabilty of infection in the univariate mixture (dots and solid line) with associated 95% confidence envelope (shaded area), and for each sample the corresponding estimates in the bivariate mixture (triangles) with associated 95% confidence intervals (bars).

True infection statuses are unknown in the post-pandemic survey, but we can safely assume that pre-pandemic samples do not belong to persons who have been infected with A/2009. We exploit this fact to investigate how many pre-pandemic samples would be misclassified as infected. Each sample in the pre-pandemic survey has a certain estimated infection probability, and we report the expected number of misclassifications i.e. the infection probabilities cumulated over all positive pre-pandemic samples. The bivariate microarray yields the lowest percentage of misclassifications (8·6 out of 64; 13%), followed by the univariate microarray (18·1 out of 64; 28%), and the HI analysis (16·1 out of 38; 42%).

Overall comparison of classifications is investigated in a receiver operating characteristic (ROC) diagram, taking different cut-off values for positive classification (HI and univariate microarray), or taking different values of the A/1918 to A/2009 ratio for positive classification (bivariate microarray) (

Maximum sensitivity plus specificity are 66% and 51% for HI (at a cut-off for standardised HI of 44), 91% and 84% for the univariate microarray (at a microarray titer of 97), and 96% and 95% for the bivariate microarray (at a A/1918 to A/2009 ratio of 0·95).

Using mixture model analyses of two population-based serological studies

Our analyses have uncovered that classification of sera belonging to persons infected with A/2009 (H1N1) works best when using the A/2009 and A/1918 antigens together. The explanation is that in the univariate analysis the component distribution of infected persons has a considerable overlap with the immune component distribution. Incorporation of A/1918 in the analysis reduces the overlap of the two distributions substantially, resulting in classifications that have higher estimated specificity and sensitivity in the bivariate than univariate analysis (

The microarray measures antibody binding and the observed antibody responses are not necessarily protective. It is known, however, that positive responses in the microarray correlate with protection against infection

In our analyses the estimated susceptible component is placed largely below the detection limit in the HI and microarray analyses, while the immune component still has substantial density below the detection limit (

Throughout, we have assumed that the susceptible, immune, and infected component distributions are independent of age. This is done for simplicity and since allowing for age-dependence in the component distributions would lead to identifiability problems, especially in older adults. As it is, the fit of the infected component distribution is strongly informed by children. However, visual inspection of the locations of the pre- and post-pandemic samples across all age groups in the A/2009-A/1918 plane shows that most lie within the regions of high support of the model, i.e. there are very few outliers. This indicates that the model and the fitted mixture model describes the data well, not only in children but also in older age groups.

Number of samples in the earlier survey (see main text), and the subset that has been tested with the microarray. Not eligble for selection were pre-pandemic samples collected after October 11 (42 samples), and post-pandemic samples from persons.

(DOCX)

Mean and variance of susceptible, immune, and infected component distributions of the univariate mixture model fitted to microarray responses against A/2009 (H1N1), and to the standardised HI titers.

(DOCX)

Mean and variance of the susceptible, immune, and infected component distribution of the bivariate mixture fit to A/2009 (H1N1) and A/1918 (H1N1).

(DOCX)