Sample Size Considerations for One-to-One Animal Transmission Studies of the Influenza A Viruses

Background Animal transmission studies can provide important insights into host, viral and environmental factors affecting transmission of viruses including influenza A. The basic unit of analysis in typical animal transmission experiments is the presence or absence of transmission from an infectious animal to a susceptible animal. In studies comparing two groups (e.g. two host genetic variants, two virus strains, or two arrangements of animal cages), differences between groups are evaluated by comparing the proportion of pairs with successful transmission in each group. The present study aimed to discuss the significance and power to estimate transmissibility and identify differences in the transmissibility based on one-to-one trials. The analyses are illustrated on transmission studies of influenza A viruses in the ferret model. Methodology/Principal Findings Employing the stochastic general epidemic model, the basic reproduction number, R 0, is derived from the final state of an epidemic and is related to the probability of successful transmission during each one-to-one trial. In studies to estimate transmissibility, we show that 3 pairs of infectious/susceptible animals cannot demonstrate a significantly higher transmissibility than R 0 = 1, even if infection occurs in all three pairs. In comparisons between two groups, at least 4 pairs of infectious/susceptible animals are required in each group to ensure high power to identify significant differences in transmissibility between the groups. Conclusions These results inform the appropriate sample sizes for animal transmission experiments, while relating the observed proportion of infected pairs to R 0, an interpretable epidemiological measure of transmissibility. In addition to the hypothesis testing results, the wide confidence intervals of R 0 with small sample sizes also imply that the objective demonstration of difference or similarity should rest on firmly calculated sample size.


Introduction
The transmission potential of a respiratory virus is commonly measured by the basic reproduction number, R 0 , i.e. the average number of secondary cases produced by a typical primary case throughout the entire course of host infection, which has been regarded as one of the most important quantities in infectious disease epidemiology [1]. The value of R 0 not only informs how transmissible an infected individual is, but also gives three epidemiological insights into the transmission dynamics, i.e., (i) the risk of observing an epidemic given a certain number of infected individuals, (ii) the risk of infection in an individual throughout the course of an epidemic (given the epidemic), and (iii) the minimum control effort that is required to prevent or curb an epidemic based on a threshold theorem [1]. Among various methods for estimating R 0 , animal transmission experiments have been used as a useful tool for measuring the transmissibility in controlled conditions [2], although it should be noted that R 0 is not only the property of a pathogen but also reflects the specific entire host-pathogen-environmental system, and thus, R 0 in the experimental setting is not directly applicable to other (e.g. natural) settings. However, through transmission experiments, one can identify the importance of various factors on transmissibility, including host (e.g. genetic variants or immune status), virus (e.g. different strains), and environmental (e.g. ambient temperature/ humidity) factors and their interactions, thereby providing important insights into mechanisms of infection and transmission.
Transmission experiments of influenza A viruses have helped to determine the molecular mechanisms of adaptation in human host due to the multi-host nature of the virus [3][4][5]. It is possible to study transmission in humans in controlled experimental settings [6] and natural settings [7], but such studies are resource intensive. Moreover, prior influenza exposure history in humans varies between individuals and is difficult to control. Although animal studies cannot replace human studies, they can provide complementary information on factors affecting transmission. The ferret model tends to be preferred over the mouse and guinea pig models for transmission experiments, because ferrets display clinical signs and symptoms which include fever, nasal discharge, and sneeze reflex [8][9][10]. In some studies, the potential of respiratory droplet transmission (inclusive of both droplet and aerosol transmissions) has been examined by placing a susceptible ferret in a cage next to another cage with a ferret inoculated with an influenza virus and allowing an exchange of air between the cages. Typically, such one-to-one transmission has been examined for two to four pairs for each virus, and the proportion of pairs with successful transmission has been compared between two (or more) different influenza viruses.
Despite a number of published ferret transmission experiment studies of influenza viruses, there has been no explicit estimation of R 0 from experiments that typically involve very small sample sizes, and moreover, the sample size rationale of such an experiment has not been extensively discussed. Although a common approach is now to use three pairs of ferrets in each virus group, and to compare the proportions of pairs with successful transmission between each group, the appropriate sample size and proper analysis of results have not been investigated. For instance, suppose that researchers did not observe any transmission for one virus (i.e. k/n = 0/3 where n and k represent the numbers of pairs and infected pairs, respectively), while all three pairs resulted in transmission for the other virus (i.e. k/n = 3/3). Given this result, we would like to know (i) if the difference in the transmissibility between the two groups is statistically significant and (ii) the degree of difference in transmissibility given the small sample size. The present study aimed to discuss the significance and power of oneto-one animal transmission experiments with particular reference to ferret transmission studies of influenza A viruses as a case study.

Materials and Methods
One-to-one transmission experiment data We start by presenting summary results of the published transmission experiment studies of influenza A viruses using the ferret model. Table 1 summarizes a total of 12 transmission experiment studies that were conducted under a one-to-one transmission experiment design [3,5,8,[11][12][13][14][15][16][17][18][19]. In the present study, we restrict our interest to the ferret model, especially its use in examining respiratory transmission, for consistency and clarity both in theory and biology. Among the total of 12 studies, nine investigated H1N1 viruses including two on 1918-19 pandemic viruses and seven on 2009 pandemic viruses. Three studies investigated the transmissibility of H5N1 viruses, two on H3N2 viruses and one on H2N2 viruses. In principle, those studies share the experimental design (i.e. inoculation of one ferret in a cage and exposure of the other ferret in an adjacent cage), but the details have been variable. The air flow (e.g. direction and air exchange rate) has not been strictly regulated by common rules, and viral dose for inoculation have not been identical among these studies, and thus, various differences in experimental designs prohibit pooling of data from differently designed studies.
With regard to the sample size (i.e. the number of pairs for each virus), all studies used small numbers of ferrets, with most choosing 3 pairs of infectious/susceptible ferrets in each group. There were two studies that used 4 pairs. Two studies used different pair numbers between two groups, i.e., one study used 2 pairs for the control and 4 pairs for the comparison group, and the other study used 3 pairs for the control and 6 pairs for the comparison group. The results (i.e. the number of pairs with successful transmission) shown in Table 1 represent the highest and lowest reported numbers among the all combinations of two viruses, and the judgment of difference (or similarity) was drawn in the original publications based on the corresponding results. The primary objective of the original studies was either to identify molecular mechanisms (e.g. specific viral gene, amino acid or protein) governing the transmissibility (n = 5) or to quantify or compare the capacity of aerosol transmissibility (n = 7).

Stochastic general epidemic model
To allow comparison of the transmissibility, here we express the result from one-to-one transmission experiment (i.e. the proportion of pairs with transmission) as a function of R 0 . First of all, we adopt an assumption that each pair is independent of other pairs, including no air-exchange between pairs. Let p s,i (t) be the conditional probability of observing s susceptible and i infected ferrets at time t given the initial condition of susceptible and infected ferrets (s 0 , i 0 ) at time 0, i.e., then the so-called ''stochastic general epidemic'' model is described by the following differential-difference equation: where N is the total population size (N = 2 in the case of one-toone experiment) and t represents the multiple of the mean infectious period (i.e. the time unit is normalized by the mean infectious period). Here it should be noted that the mass action part has been scaled by (N21), and not by N, because of small population size that requires us to precisely consider the impact of N on the incidence term. That is, in the case of small N, the transient risk of infection should be proportional to I(t)/(N21) which can be exemplarily understood for N = 2 (i.e. if we use I(t)/ N for calculating the incidence, it would indicate erroneously that the half of infectious individuals I(t)/2 would contribute to the transmission). Since the initial condition gives p s0,i0 (0) = 1, the probability of successful transmission by infinite time, q, is computed by p 0,0 (') [20][21][22] and the solution is Note that the analytical solution of q for small N is q = R 0 / (R 0 +N21) which is different from what has been previously discussed [22]. Since the one-to-one transmission experiment handles the binary outcome (i.e. success or failure of transmission), the probability of transmission is computed by employing a binomial distribution. That is, for n independent pairs of one-toone transmission experiments, the probability of observing k pairs with successful transmission is The maximum likelihood estimator of R 0 based on the observed average frequency of successful transmission, k/n, is given by equating R 0 /(R 0 +1) = k/n which yields.
The 100(1-a)% confidence interval of R 0 is calculated from the solution of R 0,CI = x/(n-x) in which x satisfies for the upper bound and for the lower bound, except that the lower bound is 0 when x = 0 and the upper bound is infinity when x = n. In these exceptional circumstances, the upper bound for x = 0 is calculated as R 0,upper = ((a/2) (1/n) -1)/(a/2) (1/n) and the lower bound for x = n is calculated as R 0,lower = (a/2) (1/n) /(1-(a/2) (1/n) ) as can be derived from the binomial distribution [23]. In addition to the final size discussed above, statistical consideration of transient state has been given elsewhere [24].

Hypothesis testing
We subsequently consider the difference in the transmissibility in published experimental studies in two different ways, because the null hypothesis has not necessarily been mentioned in the original articles in Table 1. Let R 0,ref be a specified reference value of the basic reproduction number. The first possible way to compare the transmissibility is to regard the result from each virus as one-sample comparison, which may be the case when R 0,ref of control virus can be assumed known (e.g. pre-determined) from published studies and so on. In this scenario, we compare R 0 against R 0,ref , i.e.
which may sometimes be intended to support the notion that some key molecular structure helped to acquire substantial transmissibility for a specific virus (e.g. by setting R 0,ref = 1 or R 0,ref = 0). It should be noted that R 0 depends on experimental design (air change rate per hour, air flow direction, etc) and is not comparable between differently designed experiments. Using the relationship (5), the issue of comparing transmissibility is replaced by onesample comparison of a binomial proportion. The p-value for testing (8) given that k or more pairs resulted in infection is computed by where I(.) is the indicator function.
The second way to test the transmissibility is to consider the two virus groups within the same study in Table 1 as the comparison of two binomial proportions, q 1 and q 2 (or equivalently the comparison of two basic reproduction numbers estimated for respective viruses, R 0,1 and R 0,2 ) under the hypotheses (or H 0 : R 0,1 = R 0,2 ) and the implementation is exactly the same as two-tailed exact test for two samples that has been already discussed elsewhere [25]. It should be noted that differing number of pairs between two virus groups can be easily addressed by varying sample sizes in the exact test. In both one-sample and twosample cases, the sample size estimation would have to be made directly from the binomial distribution (e.g. from (10) with a desired power). However, as an alternative, the power calculation could rest on a modified Wald test statistic, i.e., the well-known score confidence interval proposed by Agresti and Coull [26,27,28]. For numerical illustrations, we consider the p-value and power for all possible patterns of final size for the number of pairs, n = 3, 4 and 5 for both one-sample and two-sample comparisons. These numbers of pairs are specifically considered, because 3 pairs have been conventionally adopted, and we anticipate that 6 or more pairs may not be logistically very feasible for testing many types of influenza virus at present. A one-sample comparison is made by a one-tailed Fisher's exact test, while a two-sample comparison rests on a two-tailed test. While restricting our consideration to n = 3, 4 and 5, we also examine the p-value for one-sample test with varying reference values of the basic reproduction number and the number of pairs (from 1 to 10), especially in the case we have k = n (i.e. all pairs resulted in infection) or k = n21 which are frequently the case in published experimental studies. Table 2 shows the p-value and power for one-sample comparisons given that the number of pairs was 3, 4 or 5. Even when all pairs result in infection during 3 or 4 pair study, the experiment cannot indicate that the R 0 is significantly greater than 1. Only when we have a result of 5/5, the difference can be stated as significantly greater than R 0 of 1. Moreover, in the cases all pairs result in infection, one can quantitatively examine only the lower bound of R 0 , and the expected value and the upper bound of R 0 are calculated as infinite. Figure 1 shows the p-value with different numbers of pairs in the case that all pairs are infected (i.e. k = n) or all pairs minus 1 resulted in infection (i.e. k = n21). When the reference value of R 0 is as large as 2, three pairs are not large enough to demonstrate R 0 .2 at a significant level a = 0.05, and one may need at least 8 pairs and all the eight pairs need to be infected. At a stricter threshold (e.g. a = 0.01), seven or more pairs would be required to reject R 0,null = 1. In the case of k = n21, even ten pairs would not be enough to demonstrate that R 0 .2 at a = 0.05 given k = n21.

One-sample comparison
Also, when one pair escapes infection (i.e. k = n21), Figure 1 and Table 2 consistently suggest that a five-pair or smaller study cannot help judge if R 0 is significantly greater than 1. The results 2/3, 3/4 or 4/5 does not indicate significant difference from R 0 = 1. All other combinations in Table 2 cannot determine if R 0 is significantly greater than 1, and more importantly, either lower or upper 95% confidence interval of estimated R 0 for these combinations always takes an extreme value (i.e. either lower bound being 0 or upper bound being infinite).
Provided that a transmission study intends to demonstrate R 0 .0, the presence of at least one successful transmission (i.e. any k/n except for k = 0) can yield a significant result with p,0.01. However, it should be remembered that power is not substantial for k/n = 1/4 and 1/5 ( Table 2). When one intends to demonstrate that the transmission potential is less than 1, the examined total sample sizes are not enough to argue significant differences ( Table 2). That is, given the number of pairs is 3, 4 or 5, it is more feasible to show that R 0 .1 than demonstrating R 0 ,1. Table 3 summarizes the p-value and power for two-sample comparisons given that the number of pairs was 3, 4 or 5. Given three pairs for each sample, it is impossible to demonstrate any significant difference between two sample groups. In the case of four pairs, only a combination of 0/4 and 4/4 can indicate a significant difference. Given five pairs, three combinations (i.e. 0/5 vs 5/5, 1/5 vs 5/5 or 0/5 vs 4/5) could suggest significant difference in the transmission potential.

Discussion
The present study discussed the sample size considerations for one-to-one experimental studies of the transmission of influenza A viruses. Employing the stochastic general epidemic model, R 0 was derived from the final state of an epidemic [29,30,31], and its relevance to the probability of successful transmission during the one-to-one trial was explained. Three findings are particularly notable. First, k/n = 3/3 and 4/4 are not indicative of significant excess of R 0 from 1 in one-sample comparison. At least, five pairs  Table 2. One-tailed test results of the basic reproduction number based on one-to-one transmission experiment. would be required to demonstrate significant difference in the onesample comparison. Second, n = 3 is not enough to show any significant difference in two-sample comparisons. Third, k = n can yield the significant difference when n = 5 or greater in one-sample comparison, but one has to remember that the expected value and the upper confidence interval of R 0 would be calculated as infinite for small n. That is, while the experiment may be able to show significant difference from reference value, k = n can inform only the lower bound of R 0 . With the very limited sample sizes such as n = 3, 4 or 5, it is always the case that either lower or upper 95% confidence interval takes an extreme value (i.e. lower = 0 or upper = '). Keeping these points in mind, one can plan the one-to-one transmission study with reference to our computed results in Tables 2 and 3, while relating the observed proportion of infected pairs to R 0 , an interpretable epidemiological measure of transmissibility.

Number of pairs
The most important caveat in the present study in relation to the common practice is that n = 3 is not enough to show a significant difference as well as R 0 .1 for one-sample comparison, while it can demonstrate R 0 .0. Moreover, comparing a group with n = 3 against a reference group with the same sample size does not allow researchers to demonstrate any significant difference in the transmissibility between two sample groups. If two samples have to be compared, n = 4 would be regarded as minimum, and moreover, k = n for n = 4 and k = n or k = n21 for n = 5, respectively, would have to be required along with the absence of infected pairs in the control group. To interpret some results of the published studies in Table 1 which concluded difference in the transmissibility between two viruses, one sample interpretation may better be adopted for each virus against R 0,null = 0 (rather than comparing two sample groups). Overinterpretation of results without significant difference should be avoided.
It should be noted that demonstrating similar transmissibility between two groups is even more difficult in this context for two reasons. First, the sample size (i.e. the number of pairs) is very limited, and thus, it is too frequent that we do not observe any significant difference between two sample groups (Table 3). Second, demonstrating similarity must be distinguished from showing the absence of a significant difference, especially when the sample sizes are very limited. Demonstrating similarity in transmissibility would likely require much larger sample sizes, as seen in noninferiority and equivalence randomized trials [32]. Similarly, demonstrating the absence of substantial transmissibility in a single sample group is also a difficult task. As was shown in Table 2, showing R 0 ,1 cannot be achieved by n = 3, 4 or 5.
Two limitations should be noted. First, every transmission experiment study examines not only the frequency of successful transmissions but also other factors including mortality, weight loss, patterns in virus shedding, clinical signs and symptoms, behavioral changes and so on. Thus, although the quantification of transmissibility should strictly adhere to the frequency of transmission events, the interpretation of adaptation, pathogenicity and infectiousness (e.g. in the sense of virus replications in infected hosts) should be judged from multiple results. Moreover, study objectives of transmission experiment may not necessarily be to demonstrate differential transmissibility (e.g. may only be to prove that pre-symptomatic transmission can occur [33]). In this regard, the present study has focused only on a single aspect of experimental findings. Second, we adopted an independence assumption between pairs which may not be strictly the case in all published studies. If the transmission studies handled infectious virus with substantially high aerosol transmission potential, this assumption is violated. However, explicitly addressing this point cannot be achieved by employing a simple general stochastic epidemic model, and rather, would require much more complex Table 3. Two-tailed comparison of the basic reproduction numbers based on one-to-one transmission experiment (H 0 : modeling analysis. Resolving second problem (of dependence between pairs) by mathematical modeling are our forthcoming future studies. It is important that the researchers can calculate the most appropriate sample size (with or without a reference sample group) depending on study objectives. Sample size rationale should be formulated in the future including other types of transmission experiment design, such as exposing multiple animals to multiple infectious animals, or exposing two animals in two different manners (e.g. with and without direct contact). As the translation of the proportion of infected pairs into R 0 can permit objectoriented experimental design, the objectives, design and findings of transmission experiment studies have to be reviewed and the corresponding sample size rationale should be discussed. The present study could be regarded as a starting point to extensively consider hypothesis testing of animal transmission experiment results using epidemiologically well-defined parameters.

Author Contributions
Conceived and designed the experiments: HN. Performed the experiments: HN. Analyzed the data: HN. Contributed reagents/materials/analysis tools: HN HLY BJC. Wrote the paper: HN HLY BJC.