Pool testing on random and natural clusters of individuals: Optimisation of SARS-CoV-2 surveillance in the presence of low viral load samples

Facing the SARS-CoV-2 epidemic requires intensive testing on the population to early identify and isolate infected subjects. During the first emergency phase of the epidemic, RT-qPCR on nasopharyngeal (NP) swabs, which is the most reliable technique to detect ongoing infections, exhibited limitations due to availability of reagents and budget constraints. This stressed the need to develop screening procedures that require fewer resources and are suitable to be extended to larger portions of the population. RT-qPCR on pooled samples from individual NP swabs seems to be a promising technique to improve surveillance. We performed preliminary experimental analyses aimed to investigate the performance of pool testing on samples with low viral load and we evaluated through Monte Carlo (MC) simulations alternative screening protocols based on sample pooling, tailored to contexts characterized by different infection prevalence. We focused on the role of pool size and the opportunity to develop strategies that take advantage of natural clustering structures in the population, e.g. families, school classes, hospital rooms. Despite the use of a limited number of specimens, our results suggest that, while high viral load samples seem to be detectable even in a pool with 29 negative samples, positive specimens with low viral load may be masked by the negative samples, unless smaller pools are used. The results of MC simulations confirm that pool testing is useful in contexts where the infection prevalence is low. The gain of pool testing in saving resources can be very high, and can be optimized by selecting appropriate group sizes. Exploiting natural groups makes the definition of larger pools convenient and potentially overcomes the issue of low viral load samples by increasing the probability of identifying more than one positive in the same pool.


Introduction
Since the first detection in Wuhan, China, in December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen of coronavirus disease 2019 , has spread worldwide to become a pandemic [1].
The most commonly used and reliable test to confirm the presence of SARS-CoV-2 is the Real Time-PCR performed on nasopharyngeal (NP) swabs or other respiratory tract specimens, amplifying viral RNA genes of the envelope (env), nucleocapsid (N), spike (S), RNA-dependent RNA polymerase (RdRp), and ORF1. The RT-qPCR on RNA extracted from NP swab is considered the gold standard to assess the presence of SARS-CoV-2 with a specificity of 100% and a sensitivity of 93-100% [2,3]. However, several pre-analytical and analytical vulnerabilities may affect the stability of NP swabs reducing the test performances [4,5]. For example, it is widely known that RNA is stable when stored at 2-8°C for up to 72 hours but data regarding the effect of storage at +4°C or at -20°C before or after virus inactivation on viral RNA detection has not been reported so far [5].
In China and Europe, during the epidemical emergency in the Spring 2020, acute shortage of reagents availability, as well as the correlated choice of performing RT-qPCR tests preferentially on symptomatic patients, led to significant underestimation of the actual infection burden, leaving many asymptomatic patients undetected [6,7]. This has highlighted a heavy weakness in the system, stressing the need of developing screening procedures that require fewer resources and can be extended to larger portions of the population, thus allowing early detection and isolation of new cases.
RT-qPCR on samples obtained by pooling NP individual swabs seems to be a promising technique to improve surveillance. In fact, performing tests on pooled samples from a group of subjects and proceeding to single testing only on those groups resulted positive may save time and resources [8][9][10][11][12][13][14]. However, a crucial point in pool testing is that pooling may decrease the sensitivity of https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251589 2/9 RT-PCR assays due to specimen dilution, leading to a higher rate of false negatives [15]. This issue appears to be particularly relevant in the presence of low viral load infections, as those observed during the second phase of the epidemic, when, after the acute phase of the emergency, positive cases at only one viral gene with high Cycle threshold (Ct is the number of replication cycles required to produce a detectable fluorescent signal; lower Ct values represents higher viral RNA loads) have been frequently observed, and in general when testing is extended to a large number of asymptomatic or pauci-symptomatic subjects. For this reason, it is a priority to understand if the sensitivity of the RT-PCR assays is preserved when pooling, and in particular to determine which is the maximum pool size that guarantees the expected sensitivity in a context of predominance of low viral load samples. Several studies have investigated RT-qPCR pool testing through laboratory experiments or in pilot studies [9,12,13,[15][16][17][18][19][20]. Some of them focused on the dilution effect and with few exceptions found that samples with Ct > 35 are not detectable even in pools of 5 samples [15,[18][19][20]. Bateman and colleagues [15], on the basis of laboratory analyses on pools with dilutions 1:5, 1:10, 1:50, concluded that Ct≥32 are sometimes not detectable even in pools of 5 and 10.
An apparently different but related issue concerns the setting up of tools to define appropriate screening protocols based on sample pooling, tailored to the specific real contexts in which they will be applied [10,11,14,[21][22][23][24][25][26][27]. In fact, the pool size which optimizes the gain with respect to single RT-PCR test could vary depending on the characteristics of the population on which pool testing is applied. The prevalence of infection and the presence of "natural" clusters of subjects (e.g. families, school classes, hospital rooms) on which the pool test could be performed are key features to determine the best testing strategy.
Our study wants to contribute to the discussion about the use of RT-qPCR pool testing through a reflection on its actual potentials in the second/third phase of the epidemic. In particular, our goal is twofold: Regarding the first objective, we report the results of a preliminary analysis conducted on a small number of samples, aimed at investigating the COVID-19 RNA stability at different storage temperatures and times from collection, and the ability of the RT-qPCR pool testing to detect positive samples even with Ct above 35. Regarding the second objective, we performed Monte Carlo (MC) simulation analyses to evaluate the performance of pool testing when applied in pseudo-populations with different prevalence of infection. We focused on the role of pool size and on the opportunity to take advantage of the presence of a natural clustering structure in the population in order to increase the gain of the procedure in terms of saved RT-qPCR analyses.

Laboratory analyses
The laboratory analyses were performed on completely anonymous leftover samples from swabs of patients already analysed for the presence of SARS-CoV-2. Samples were collected by someone other than the authors (nursing personnel). No consent was obtained from patients, being the samples anonymized before we accessed them for the current analyses.
Each positive sample was retested singularly after anonymization and then included in the pools considering only the results from this second test for subsequent evaluations.
Sample collection.
Nasopharyngeal swabs were collected using eSwab devices (Copan Italy) containing liquid Amies media. Samples were processed at the Regional Laboratory of Oncological Prevention Unit of the Institute for cancer prevention, research and oncological network (ISPRO), Florence (Italy).
Single sample test.
Nasopharyngeal swabs were firstly tested as single samples, inactivating 300 μL of transport swab medium with 225 μL of lysis buffer and 15 μL of Proteinase K, incubated at 56°C for 15 minutes, immediately after arriving in lab.
Viral RNA was extracted using an automated system (NIMBUS IVD, Seegene) and the SARS-CoV-2 detection was performed by RT-real time PCR (Allplex 2019-nCoV Assay, Seegene; CFX96 , Bio-Rad) amplifying three viral genes (E, RdRP and N) and a process Internal Control (IC). The amplification occurred if Ct was inferior to 40 cycles. Results have been interpreted as follows: RNA virus was considered not detected if only IC was amplified; RNA virus was considered detected if at least one of the viral genes was amplified; RNA virus was considered detected with low viral load if one or more of the viral genes were amplified with a Ct>35.
After performing the first RT-qPCR, the remaining swab transport media of 7 positive samples were processed in 3 different ways: The 7 samples were re-tested after thawing, under the same analytical conditions of the first analysis.
Firstly, we prepared 3 pools-one for each group of positive samples-mixing equal volumes of 10 negative nasopharyngeal swabs. Then 7 pools of 5 specimens and 7 pools of 10 specimens, each containing only one positive sample, were prepared. Specifically, for each positive sample (4 in A, 2 in B, 1 in C), we mixed 432 μL of negative pool and 108 μL of the positive sample to obtain a pool of 5 specimens and 486 μL of negative pool and 54 μL of the positive sample to obtain a pool of 10 specimens.
For samples 5B and 7C we tested also pools with 19 and 29 negative samples, diluting the pool of 10 samples 1:2 and 1:3 with the negative pool.
providing new evidence from laboratory analyses about the performance of pool testing on samples with low viral load, which is still limited in the literature; Each pool was then tested under the same conditions of the single sample.

Monte Carlo simulations
Monte Carlo analyses were implemented considering a pseudo-population of N = 10000 subjects, on which a two-stage screening procedure is applied: first subjects are grouped and pool testing is performed on each group, then individual RT-qPCR tests are performed on individuals belonging to the groups that tested positive.
We considered 6×5 scenarios defined by the percentage of infected subjects in the population (prevalence, p) and the group size used for pool testing (k). Specifically, we considered p = 0.003, 0.005, 0.01, 0.03, 0.05, 0.1, and k = 2, 3, 4, 5, 10. The values of p were tailored to reproduce contexts of very low/moderate prevalence of ongoing infections.
For each scenario, we hypothesized two different strategies for group formation. With the first strategy (R), groups are randomly defined, i.e., they are composed by randomly partitioning the population into n = N/k groups. When the prevalence is small (i.e. the number of infected individuals is small compared to n ), this-with rare exceptions-leads to groups with at most a single infected individual.
With the second strategy (C), groups are generated assuming that they correspond to natural clusters in the population and, specifically that, if one member is infected, there is a higher likelihood that other group members are also infected. Examples of natural clusters in a real population are families, subjects sharing the same workplace, patients in the same room or in the same floor of a hospital, classmates. In order to generate these groups, we assumed that the number of positive subjects within each positive group of size k followed a zero-truncated Binomial distribution with parameters k and π, defined as follows: For each k, we considered different values of π: the larger the value of π, the higher the likelihood that infected subjects tend to concentrate in the same groups is. As a consequence, as π increases, the number of positive groups decreases. It is worth noting that each combination of k and π produces a different expected number of positive subjects within each positive group, corresponding to the expected value of the zero-truncated Binomial distribution, πk/(1 − (1 − π)k). Specifically, for each k, we considered values of π such that the expected number of positive subjects within each positive group ranged between low (less than 2) and quite high values (90% of k). Even if for each scenario, i.e. each pair of k and π, the expected number of infected specimens was equal for all positive groups, the observed number could vary from 1 to k.
From a practical point of view, simulations were performed iteratively by randomly sampling from a Binomial distribution of parameters k and π the number of infected individuals to be assigned to the first group of k individuals, then to the second one and so on, until there were no more infected individuals to be assigned to a pool. All the remaining groups were assumed to include zero infected specimens. Since positive groups contain at least one infected individual, the actual distribution of the number of infected individuals in these groups is a zero-truncated Binomial distribution.
Taking the individual RT-qPCR test as the gold standard, in our simulation we assumed that the specificity of pool testing was nearly optimal and equal to the one estimated by Hogan and colleagues in one of the first paper that focused on pool testing [9]: they found only 1 positive over 290 pools of uninfected samples, for a specificity of the results on the pool (probability that a pool is negative given that it does not include positive specimens) equal to 0.997. Regarding sensitivity (probability that a pool is positive given that it includes at least one positive specimen), we focused on an optimal situation where sensitivity of pool testing was very high regardless of the pool size, and in particular equal to 0.995 [16]. This sensitivity value is likely appropriate in populations where the viral loads of the infected subjects are high. For example, in Bateman et al. [15] viral loads with Ct≤28 were always detected in both 1:5 and 1:10 dilutions.
In order to get some insights on the role of the dilution effects, we also performed simulations under the assumption that the sensitivity of pool testing decreases as the pool size increases when groups are randomly constructed (see S1 Appendix for details and results).
For each p and k and each pool testing procedure, we ran 500 MC iterations and for each iteration, we calculated the following quantities: number of RT-qPCR tests (total number of pool tests performed at the first step plus individual tests performed at the second step on the positive groups), percentage of saved RT-qPCR tests, defined as 100×(1-number of RT-qPCR tests/N), number of individuals receiving a false negative result and probability that a subject receiving a negative result is actually not infected (negative predictive value, NPV). Note that, under the assumption that the individual RT-qPCR test is the gold standard, the pool testing procedure does not lead to false positive results-individuals belonging to false positive groups are correctly classified at the second step of the procedure when individual tests are performed-and the positive predictive value is 1.
For each simulation setting, we calculated the MC mean and 90% variability interval (5 and 95 percentiles of the simulated values) for each quantity of interest.

Laboratory analyses
After thawing, the samples from group A showed a reduction of viral load compared to the first test, with an increase in Ct ranging from 0.67 to 4.53. The ΔCt (Ct at the re-test-Ct at the first test) was greater as the viral load was lower. The viral load decreased also in the 5B sample, but the increase in Ct was less evident. Samples 6B and 7C maintained the same viral load in the repeated test (Table 1). As expected, under the pool testing strategy R the average number of positive groups and the average number of infected subjects per group increased with increasing p and/or k (Table 2).
Expand Table 2. Pool testing on random groups: Monte Carlo (MC) mean of the number of positive subjects within each positive group and of the number of positive groups, by prevalence (p) and group size (k). doi:10.1371/journal.pone.0251589.t002 More » Table 3 summarizes the group composition under the pool testing strategy C. In particular, for each value of k and π, it reports the expected number of positive subjects within each positive group and, for each prevalence p, the corresponding MC mean of the number of positive groups. Of course, given the expected number of positive subjects within each positive group, the number of positive groups increased with increasing prevalence.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251589 5/9 Expand Table 3. Pool testing on natural clusters: Expected number of positive subjects within each positive group, by group size (k) and Binomial probability (π), and Monte Carlo mean of the number of positive group by group size (k), Binomial probability (π) and prevalence (p). doi:10.1371/journal.pone.0251589.t003 More » Table 4 shows, for the random pool testing strategy, MC means and 90% variability intervals of the number of RT-qPCR tests, by prevalence and group size, assuming a sensitivity of pool testing equal to 0.995 regardless of k (no dilution effects). In populations where the proportion of infected individuals was lower than 1%, the total number of RT-qPCR tests progressively reduced as the group size increased, with rather tight non-overlapping 90% variability intervals. Moreover, the lower the prevalence of the disease, the lower the total number of RT-qPCR tests was. For populations with prevalence equal to 3%, our simulations still suggest an inverse relationship between k and the total number of required RT-qPCR tests as long as k<5, while for group sizes of 5 and 10 subjects the expected number of tests was similar. In populations where the prevalence was 5% or 10% the best group size ranged between 3 and 5; k = 10 was not recommendable, as it implied the highest number of RT-qPCR tests. More » All the pool testing strategies in Table 4 appeared to perform well in terms of cases detection. Taking the individual RT-qPCR test as the gold standard, the NPVs were greater than 0.999 for values of prevalence and group sizes within the ranges considered in this analysis (results not reported). The expected number of false negatives (which represents a constant proportion of the total number of cases in each population) obviously increased with the prevalence: in the simulated population of 10000 it varied between 0.2 for p = 0.003 to 5 for p = 0.1 (see S1 Table).

Expand
Fig 1 shows the MC means and 90% variability intervals of the percentage of saved RT-qPCR tests in respect to the gold standard procedure for the random strategy R. In populations where the proportion of infected individuals was lower than 1%, the pooltesting strategies reduced the number of required RT-qPCR test by more than 47% and the larger the group size, the higher the percentage of saved RT-qPCR tests was. Specifically, the main gain was obtained by applying the pool-testing with groups of size 5 and 10, which led to save more than 74% and 80% of RT-qPCR tests, respectively. In populations where the proportion of infected individuals was greater than (or equal to) 3%, our results suggest that the best pool-testing strategies were those with k ranging between 4 and 5. More » Each panel in Fig 2 shows the percentage of saved RT-qPCR tests as a function of prevalence, p, and group size, k, in the case of natural clusters. The four panels refer to four scenarios of π, which tunes the natural correlation between individuals in the same group. Taking advantage of this natural correlation, pools of 10 subjects provided a gain even in situations where the prevalence was relatively high (5%). More » https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251589 7/9

Discussion
The pandemic of SARS-CoV-2 represents a hard challenge for health-care systems and their facilities. Expanded molecular testing for SARS-CoV-2 is urgently needed to enable identification of infected individuals, tracing and quarantining of their contacts. Since the beginning of the emergency, the strategy of pool testing has been proposed as an alternative to RT-qPCR single tests to save time and resources [8,9,12,13]. However, sample pooling determines a dilution of primary specimens, which may cause the nondetection of samples with low viral load, which could represent a large number of cases in some phases of the epidemic. Moreover, there is not only one way to perform pool testing, as different strategies for groups creation can be used, that can be more or less complex and more or less convenient depending on the actual epidemic scenario [14].
In this study, we firstly tested the stability of nasopharyngeal swabs and the ability of detecting positive samples with low viral load in pools of different size. Then, we investigated through simulations the performance of alternative pool testing strategies, quantifying the potential gain in terms of saved RT-pPCR tests as the pool size changed, under different prevalence scenarios. Specifically, we focused on strategies where the groups were built at random and strategies that exploited the possible presence of natural clusters of infection in the population. While studies have been published which investigate the first kind of scenarios [21][22][23][24][25][26][27], the possible advantages of performing pool testing on natural clusters have been less explored. In a recent paper, Reweley and colleagues [11] come to conclusions similar to ours. They do not address either the problem of dilution or false negative results, but show via simulations that pool testing could be more efficient than expected even for high levels of prevalence, provided that groups are constructed following the order of the specimens collected in the same sampling site. This order could in fact reflect the presence of infection clusters in the population.
The results of the laboratory analyses should be interpreted with caution because conducted on a limited number of specimens. However, they suggest that, while high viral load samples seem to be detectable even in pool with 29 negative samples, particular attention should be deserved to specimens with high Ct: we were able to identify one sample with a Ct of 37.3 only in pool with 4 negative specimens. This is in line with results reported elsewhere which show a decreased sensitivity of pool testing in the presence of low viral load samples [15,18,19].
The experimental results also indicate that inactivating nasopharyngeal eSwab devices as soon as possible and, in any case, before freezing the samples, is fundamental. We did not investigate the performance of RT-qPCR of samples stored at -80°C before inactivation, as suggested by CDC guidelines [28]. However, Torres and colleagues [18], using samples stored at -80°C before inactivation, were not able to detect samples with Ct>35 even in mini-pool of 5 samples. So we can speculate that inactivating the nasopharyngeal swabs before freezing remains fundamental even in case of storing at -80°C.
The MC simulations indicate that the accommodation of the pool size to different infection scenarios is challenging and should be carefully considered when planning screening strategies based on pool testing. Overall, our results confirm that pool testing is useful in epidemic phases or contexts where the infection prevalence is low. In these situations, the gain of using pool testing in respect to individual RT-qPCR tests can be very high in terms of saved resources, and can be optimized selecting an appropriate group size. If a random criterion is used for group creation, pool testing on groups of 10 specimens could be in principle very convenient for values of prevalence up to 0.01, but smaller pools of 4 or 5 specimens should be preferred as the prevalence increases. On the contrary, if pool testing is conducted on natural clusters of infection, pools of 10 specimens could still be a good choice even for prevalence exceeding 0.01.
In our main MC simulations, we assumed very high sensitivity for pool testing, obtaining a very low number of false negatives even in case of large prevalence. However, as already discussed, both our experimental results and evidences reported in the literature indicate that the sensitivity could be not optimal in case of low viral load samples, especially for large k [15,[18][19][20]. Nevertheless, it is worth noting that defining the sensitivity of pool testing as a function of pool size is a highly speculative exercise. Even if laboratory results were available to derive the probability of detecting viral loads in pools of different size as a function of Ct-this is the idea in Bateman et al.
[15]-, or if mathematical models for the dilution effect had been developed [29], realistic simulations would require hypotheses about the distribution of the viral loads, which may change across populations and over time.
While a reduction of the sensitivity of pool testing as k increases is expected if the groups are randomly created and, consequently, the expected number of positive specimens in the positive pools is around 1 (for values of ongoing infections prevalence within the range considered in this paper), this could be not the case if pool testing is performed on natural groups (e.g. families). In fact, if pool testing is performed on natural groups, the probability of more than one positive specimen in the same group is higher and this likely leads to an increase in the expected overall viral load of the pools containing infected individuals, making the virus detectable even if the viral loads of the single swabs are low. This could be an additional advantage of applying pool testing strategies which exploit natural clusters in the population or, more in general, which tend to put together subjects with larger probability of being infected.
In balancing the gain in terms of saved time and resources with the accuracy of pool testing, it is crucial to assess the actual public health consequences of leaving low viral load infections undetected. In fact, a positive PCR result reflects the presence of viral RNA but does not necessarily indicate the identification of viable virus. Although viral RNA can be detected by PCR even after the resolution of symptoms, the amount of detected viral RNA is substantially reduced over time and generally below the threshold where replication competent virus can be isolated [30][31][32][33]. Wölfel and colleagues [30] found that virus isolation is not successful beyond the 8 day from illness onset, when the viral load dramatically decreases, and <6 log10 RNA was previously shown to represent the viral RNA load threshold for virus infectivity.
Additionally, some studies documented that there may be a correlation between reduced infectivity and decrease in viral loads or did not rule out it [34,35]. This hypothesis has led the WHO to review the "Criteria for releasing COVID-19 patients from isolation" on June 17 2020 [36]. If, according to these evidences, larger Ct are indicative of lower infectivity, the reduced sensitivity of pool testing with respect to individual testing in detecting low viral loads would have less severe consequences.
Finally, we would like to remark that creating pools according to natural clusters in the population could provide an additional guarantee that any false negative groups have a negligible impact in terms of public health. In fact, infectivity of a subject with low viral load belonging to a natural group (e.g. family), where all other members are negative, is likely under the threshold allowing the contagion. Thus, if the low viral load is not detected because of the unfavourable specimen dilution (only one infected over k), the risk that new infections may derive from the undetected one could be low. On the contrary, if a low viral load is able to produce contagion, more than one infected specimen is expected in the natural cluster, with the consequence of a larger probability that the pool test results positive.

Conclusions
® th th https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251589 8/9 RT-qPCR pool testing can be a cost-effective procedure to perform screening on populations where the prevalence of ongoing infection is low.
Dilution of the specimens is a crucial issue and further investigation is needed to define testing procedures which minimize the degradation of viral RNA before sample pooling.
Exploiting the natural clusters in the population (e.g. families, school classes, hospital rooms) may enhance pool testing performance also in the presence of high rates of low viral load infections, allowing the definition of larger pools and increasing the gain in time and resources with respect to single RT-qPCR testing.
Assessing and comparing the performance of alternative screening procedures by simulations is a fundamental step before any practical implementation on real populations.
Supporting information S1 Appendix. Simulations under a hypothetical scenario of dilution effect.