OptisampleTM: Open web-based application to optimize sampling strategies for active surveillance activities at the herd level illustrated using Porcine Respiratory Reproductive Syndrome (PRRS)

Porcine reproductive and respiratory syndrome virus (PRRSv) infection causes a devastating economic impact to the swine industry. Active surveillance is routinely conducted in many swine herds to demonstrate freedom from PRRSv infection. The design of efficient active surveillance sampling schemes is challenging because optimum surveillance strategies may differ depending on infection status, herd structure, management, or resources for conducting sampling. Here, we present an open web-based application, named ‘OptisampleTM’, designed to optimize herd sampling strategies to substantiate freedom of infection considering also costs of testing. In addition to herd size, expected prevalence, test sensitivity, and desired level of confidence, the model takes into account the presumed risk of pathogen introduction between samples, the structure of the herd, and the process to select the samples over time. We illustrate the functionality and capacity of ‘OptisampleTM’ through its application to active surveillance of PRRSv in hypothetical swine herds under disparate epidemiological situations. Diverse sampling schemes were simulated and compared for each herd to identify effective strategies at low costs. The model results show that to demonstrate freedom from disease, it is important to consider both the epidemiological situation of the herd and the sample selected. The approach illustrated here for PRRSv may be easily extended to other animal disease surveillance systems using the web-based application available at http://stemma.ahc.umn.edu/optisample.


Inputs and outputs of 'Optisample TM '
The model formulation and parameterization intends to estimate the probability of freedom from disease by taking into account the sampling strategy as well as characteristics of herd demography and disease epidemiology. Key parameters are estimated based on observational data regarding disease occurrence in the past. Demographic and epidemiologic features are specified via the herd size (N), the start of the observation period (hd), the end of the observation period (cd), the number of outbreaks occurred during the period of observation (n ou ), the expected duration of pathogen persistence in the herd (p p ), the time span between two outbreaks occurred (f ou ) and the correlation between successive sampled groups for the pathogen prevalence (ICC bt ). The sampling strategy was included via the expected prevalence to detect (P Ã ), the frequency of testing (f t ), the number of tested samples (n t ), and the sensitivity (se test ), and the cost (Price test ) of a laboratory tests. Noteworthy, the test specificity here is assumed to be perfect (i.e. 100%) because, in the event of positive result, it is expected that an exhaustive epidemiological investigation would take place at herd and sufficient samples would be collected and tested to rule out false positive results.
If all tests are negative, 'Optisample TM ' provides as outputs the PFree and the overall cost of testing (Cost t ). Based on the representativeness of the sample of the herd and the pathogen distribution within the herd, the model estimates the PFree for each herd simulating two scenarios (named S and D). In scenario S is assumed that the pathogen is homogeneously distributed and a representative random sample is always selected from the herd over time. In contrast, in scenario D, it is assumed that the pathogen distribution is heterogeneous among different sub-units of the herd and the sampled sub-units vary over time. This second scenario would be explained by demographic structure, biosecurity measures, management of the farm and logistics of sample collection. An example of this situation would be in sow herds, where, to determine the herd status of PRRSv, the producers often conduct samplings only in piglets that are to be weaned. In these cases each sampling is conducted in different sub-units of animals each time. The PFree is estimated after each sampling t for both scenarios S and D (PFree S,t PFree D,t ) and over a time frame of 12 days, 12 weeks or 12 months. The probability of freedom over this time frame is approximated by computing the area under the curve, which is referred to as AUC S or AUC D according to the corresponding simulated scenario.
Parameters used in 'Optisample TM ' are shown in Table 1.

Modelling process
The time frame assessed by the model (i.e. 12 days, 12 weeks or 12 months) depends on the frequency of consecutive testings (f t ) set by the user (daily, weekly or monthly). The model automatically scales all the inputs in days, weeks or months according to f t to compute all the outputs. The modeling process comprises different steps. 1. To estimate the probability that the herd is infected before conducting any sampling (PI t = 0 ) the the model makes use of four inputs. These inputs are (1) start of the observation period (hd), (2) end of the observation period (cd), (3) number of outbreaks that occurred during the period of observation (n ou ), and (4) expected duration of pathogen persistence at the herd in the event of outbreak (p p ). Here the value of p p is defined using a continuous uniform distribution [12] with minimun and maximum expected duration values. To describe the uncertainty and variability of PI t = 0 the model computed its value using a Beta distribution with parameters α and β [13] automatically derived from hd, cd, n ou and p p following the expression: where n ou Ã p p correspond to the period of time in which the pathogen may persist in the herd; and (cd − hd)−(n ou Ã p p ) correspond to the total period of time with available information during which there is no pathogen persistence.
2. At time t = 1 a first sampling is conducted on a number of animals (n 1 ) using a given diagnostic test. The probability of detecting at least one infected animal if the herd is infected (Se t = 1 ) is estimated considering n 1 , a minimum proportion of infected animals within the herd that we would expect if the disease was present (P Ã ), the size of the herd (N), and the sensitivity of the diagnostic test (se test ). The value of P Ã is included as a fixed value that ranges between 0 and 1 and is set by the user based on the market-requirements or accreditation purposes. The se test is expressed as a Pert distribution [14] with possible values ranging between 0 and 1.
The values of the se test may be determined based on the information provided by the veterinary diagnostic laboratory that processes the samples or based on available scientific references. Here, the user could set the 2.5th and 97.5th quantiles as proxy measurements for the boundary parameters if the values of se test are expressed as 95% confidence interval.
The Se t = 1 is calculated using a hypergeometric approximation based on the approach proposed by Cameron and Baldock (1998) [15,16]. The Se t = 1 is expressed as: 3. If all the samples of t = 1tested negative, the model estimates the PFree t = 1 by simulating the scenarios S (PFree S,t = 1 ) and D (PFree D,t = 1 ). To assess the influence of selecting different sub-units over time, 'Optisample TM 'includes a parameter that represents the correlation between successive sampled groups for the pathogen prevalence (ICC bt ). In scenario S, where the pathogen is homogeneously distributed throughout the herd, and the sampling is conducted over time in a unique and representative group of animals of the whole herd, the value of ICC bt is equal to 1. Thus, the PFree S,t estimated from a specific sampling can be directly extrapolated to the rest of the herd. Here, the PFree S, t = 1 is computed using a Bayesian inference approach that considers the PI t = 0 and the Se t = 1 [17] as follows: In the scenario D the spread of infection within the herd differs among animal groups. The sub-units usually are interrelated, but are not exactly the same in terms of pathogen distribution. In this scenario, the estimates can only be partially extrapolated to the successive groups according to the value of ICC bt defined as a continuous uniform distribution [12] that can take values between 0 and 1 considering the structure and management of the herd. In these cases the PFree D,t = 1 in successive groups or sub-units of the same herd depends on ICC bt and is computed as: 4. Once PFree S,t = 1 and PFree D,t = 1 are estimated, the model calculates the probability of having overlooked the disease in each respective scenario S and D (named PI S,t = 1 and PI D,t = 1 ) as: 5. However, there also exists the possibility of pathogen incursion between consecutive samplings (PI bt ). This value is highly variable, uncertain and mainly depends on trade movements, biosecurity measures, proximity to other infected farms and environmental viability. In this first version of the model, to facilitate the programming, computing and a better understanding of the influence of PI bt on the results, the parameter is assumed as constant over time. The value of PI bt is automatically derived from historical data using the minimum and maximum time span between outbreaks occurred in the herd (named f o(min) and f o(max) ) and the minimum and maximum periods of time in which the pathogen may persist in the herd (named p p(min) and p p(max) ). Here, in the event of no data, the user can set the minimum and the maximum values considered by the model. Here, the application computed the value of PI bt as a Pert distribution [14] following the formula: 6. From the PI bt and the respective values of PI S,t = 1 and PI D,t = 1, the model computes for each scenario the overall probability that the herd is infected before the second sampling (PrItot S,t = 1 and PItot D,t = 1 ) [17] as follows: 7. For each consecutive sampling t, the model develops an analogous process to the previous calculations (steps [2][3][4][5][6] to compute the values of Se t , PFree S,t , PFree D,t , PI S,t , PI D,t , PItot S,t and PItot D,t where t varies from 2 to 12. 8. The previous steps estimate the PFree S,t and PFree D,t after each sampling t. The area under the curve (AUC) is computed over all sampling events to estimate the overall probability of being free of infection. Here, the AUC is an integrated metric of the confidence of disease freedom for all the periods. Its computation used the sum of consecutive values of the respective PFree t obtained from all consecutive samplings in each scenario based on the trapezoidal rule [18] as follows: Where Δt represents the elapsed time between consecutive samplings. The AUC value ranges between 0 and 1 and indicates the probability that the herd was free from the infection throughout the assessed period, being 1 if PFree = 100%, and 0 if PFree = 0%. Depending on the scenario S or D, AUC is denoted as AUC S (for homogeneous pathogen distribution and random sampling over time) or AUC D (for heterogeneous pathogen distribution and sampled sub-units varying over time). The AUC is represented as minimum, median and maximum values taking into account the ranges for the inputs previously set into the model.
10. Finally the model computes the cost of testing (Cost test ). The model sums all the samples tested over time and multiplies this value by a given cost of each individual test (Price test ) provided by the user.
Visualization procedure

Development environment
'OptiSampleTM' was developed using the statistical R software [19] and Rstudio [20] as integrated environment of R. The package 'shiny'was used to build the interactive web application [21]. The package'mc2d' was used to compute the pert distributions of the se test and the PI bt [14] and the package 'RSurveillance' was used to calculate the Se t using the hypergeometric approximation assuming a known population size [16].

Simulation of scenarios
To illustrate the functionality of 'Optisample TM ', we estimated and compared the PFree of PRRSv using different schemes in three hypothetical swine herds located in regions with disparate epidemiological situations and infection status. The status of these herds was determined considering the PRRSv shedding and exposure according to the standardized terminology defined by the American Association of Swine Veterinarians [22]. In all scenarios we aimed at detecting a hypothetical design prevalence of 5%, a common threshold used to eliminate PRRSv from the herds by herd closure [23]. These three herds had some features in common such as herd size, expected duration of the pathogen persistence in the herd in the event of an outbreak and values of correlation between successive sampled groups for the pathogen prevalence. The herd size was 3,000 animals. The minimum and maximum values of pathogen persistence for PRRSv based on the previous studies were set between 147 and 231 days [24]. The correlation between successive sampled groups for the pathogen prevalence to simulate the scenario D ranged between 0.5 and 0.7. Herd A was a multiplier herd with very low incidence of PRRSv (i.e. one outbreak every 5 or 6 years). This herd had a negative infection status (IV). The disease status in this herd had Open web-based application to optimize sampling strategies for active surveillance at the herd level been followed for the last 5 years and no outbreaks had occurred during this period. No pigs had been introduced recently, the level of biosecurity was high, and the number of pig movements into other farms was relatively small. The objective here would be to demonstrate that herd A was free from PRRSv with a 95% confidence level testing individual sera with a commercial PRRSv antibody ELISA kit with a sensitivity of 98% (97%-99%) [25-26]. We assumed a price of USD 5 per serological test.
Herd B was a commercial herd with negative infection status (IV) with a high incidence of PRRSv (i.e. one outbreak every 2 to 3 years). The farm had introduced new pigs. The disease status of the herd of origin was unknown, and due to the lack of information the model set the PI t = 0 automatically to 0.5. The aim here would be to demostrate that herd B was free of PRRSv infection with a 95% confidence level using the same commercial PRRSv antibody ELISA kit used for herd A.
Herd C was a commercial herd with medium incidence (i.e. one outbreak every 3 to 4 years). This herd was classified as positive stable undergoing elimination according to the RT-qPCR positive at weaning (II-B). Here the objective would be to assure that the infection had been eliminated. Sera samples were tested using a PRRSv RT-qPCR with a sensitivity of 98% (97%-99%) as described elsewhere [27]. In this case there was evidence that the herd had been recently infected, and thus for the PrI t = 0 the user could set hd to the initial date of the outbreak (here, as example, we set the date of two months ago) and a value of 1 as number of outbreaks occurred since this date. We assumed a hypothetical cost of USD 10 per molecular test.
Three sampling schemes conducted over the course of a year were assessed for each of the herds. In sampling scheme I 30 samples were collected per month in each herd. In sampling scheme II 50 samples were collected per month in each herd. In sampling scheme III the strategy varied in each herd (i.e. IIIa, IIIb and IIIc) to achieve a~95% probability of being free of infection at lower cost in the scenario S (i.e. homogeneous distribution of the infection in the herd) ( Table 2).

Results
The probability of being free from PRRSv infection for the herds A, B and C after conducting consecutive sampling over one year with the costs of testing are shown in Table 2 and plotted in Figs 2-4.
The comparison of outcomes of AUCs sampling 30 animals by month illustrated that the confidence of being free from PRRSv over the entire period decreased when increasing the probability of being infected initially or between successive samplings, with median values of 0.97 for herd A, 0.89 for herd B, and 0.82 for herd C.
AUC D results for herds A, B and C indicated a marked decrease of confidence if the pathogen was assumed to be heterogeneously distributed between sub-units in the herd. In these scenarios, to substantiate freedom from PRRSv, it would be necessary a substantial increase in the pressure of sampling, almost doubling the number of samples over time (see scheme II for herds A and B).
The results of herd C showed that, to demonstrate freedom from infection when the risk of being initially infected was high, it was necessary to substantially increase the sample size during the first samplings. For herd C, due to the cost of the molecular test, the final budget was higher than for herds A and B.
Plots depicted in Fig 2, Fig 3 and Fig 4 allowed comparing the monthly results computed over time for the different scenarios. These outputs showed how influential the initial probability of infection was to demonstrating freedom from infection over time. The impact was more evident in herd C. Furthermore, the patterns of the figures demonstrated the ability of substantiating disease freedom over time based on cumulative information obtained from previous sampling and how the probability of infection between consecutive sampling might impact these estimates. Here, the lines connecting the monthly estimates showed an increase in those months in which samplings were conducted and a decrease in those months in which there was no sampling (see scheme IIIa).

Discussion
Prevention, control, and eradication of infectious animal diseases at herd level require access to up-to-date information on the infection status of the herd. Most of this information is often obtained from periodic samplings usually conducted following predetermined schemes. For example, for PRRSv, a relatively common practice in sow herds intending to demonstrate freedom from infection is to test serum samples from 30 weaned pigs. When no RT-qPCR positive results are obtained in four consecutive samplings, it is estimated with a 95% confidence that the PRRSv prevalence in the herd is below 10% and the herd is considered free from PRRSv infection. The outputs of our model demonstrate how this strategy is affected by the epidemiological situation of the herd; hence, a general strategy to demonstrate disease freedom may not serve equally well in different epidemiological situations. The modelling approach presented here allows to introduce inputs taking into account their variability and uncertainty, and to assess the influence of different determinants on the probability of freedom from infection. The model illustrates the importance of checking the health status of animals at the arrival to maximize the likelihood that introduced animals are not infected. If the probability of being free at the start is low or there is no historical data available to determine PI t = 0 (here by default PI t = 0 = 0.5), the model shows that, to demonstrate freedom from PRRSv, we need to test a  Open web-based application to optimize sampling strategies for active surveillance at the herd level higher number of samples during the initial samplings, compared to other scenarios (herds C and B versus herd A). Moreover, depending on the initial infection status of the herd, the risks of disease introduction, the impact of the disease and the immediate aims to achieve, the sampling and testing protocol might be different. For example in herd A, which did not introduce pigs or had a small number of movements among herds, the risks of initial infection or between consecutive samples were low. In this herd the very early diagnosis was not as important as it was in herd B. On the other hand, if new animals had been introduced into the herd at t = 0 or between consecutive samplings, the probability of being infected initially or between consecutive samplings varied, influencing the distribution of the pathogen in herd. In contrast, in herd B, there were many animal movements, and we might be interested in detecting lower levels of prevalence than in herd A, and thus, the early detection of viraemic animals may be more critical. In this case, the use of a RT-qPCR test to detect viraemic animals at earlier stages, a selection of a lower P Ã and the increase of the sampling frequency could be more appropriate. In this sense, 'Optisample TM ' might help to assess the probability of being free over time adjusting the hypothetical prevalence, test sensitivity and sampling frequency. Indeed, when the values of P Ã or se test are lower, larger sample sizes are required to demonstrate that the herd is free from disease.
The probability of being free over time also depends on the risk of incursion between consecutive samplings, as demonstrated by the model outputs. Such risk varies according to biosecurity measures in place, the frequency of direct or indirect contacts with other infected herds, and the viability of PRRSv in the environment. When the risk of disease introduction between consecutive samplings is low, the previous negative outputs also provide cumulative information to substantiate that the herd is free from infection. As a result, the lag between samplings may be extended while maintaining a high confidence in disease freedom (see scheme IIIa for herd A). In contrast, when the probability of incursion between samplings is high, the probability of being free over time becomes low and the frequency of samples should not decrease (see schemes IIIa and IIIb).
'Optisample TM ' also illustrates the importance of sample selection. To the best knowledge of the authors, previously available software [28, 29] to calculate sample size in order to detect infection assume that, in the event of infection, this will be homogeneously distributed across the herd. However, from our model outputs, it seems evident that, if the groups sampled are heterogeneous and different sub-units of animals are sampled over time, the confidence of disease freedom decreases dramatically. The value of ICC bt may be challenging to estimate, given that this parameter depends on management and structure of each farm. Thus, to get plausible values for each case, we would require a specific model to assess the pathogen spread within each herd. Still, we believe the inclusion of this parameter demonstrates the importance of assessing the process of sample selection to substantiate freedom from disease.
As a limitation of the model, it is important to remark that in this initial version of Optisample TM , to facilitate programming, computation, and a better understanding of the process, the herd size and the risk of incursion between consecutive samplings are set as constant values throughout the entire study period. However, in those herds in which the herd size or the risk of pathogen incursion varies over the period of study, such assumption may lead to biased results, and thus the interpretation of outcomes may be misleading. A potential extension in future versions to improve the accuracy of outputs would be to estimate the risk of incursion between samplings according to season or other associated factors using available continuous information of each herd.
In summary, the work here illustrated a novel approach to enhance the design of active surveillance for PRRSv at herd level. Additionally, the approach here, including its principles and methods, may be easily extended to other surveillance contexts for a variety of species and animal diseases. This freely available application contributes to assessing the importance of the main factors affecting the probability of disease freedom at herd level, ultimately supporting management decisions to prevent and mitigate the impact of animal diseases on susceptible populations.