Three-Level Mixed-Effects Logistic Regression Analysis Reveals Complex Epidemiology of Swine Rotaviruses in Diagnostic Samples from North America

Rotaviruses (RV) are important causes of diarrhea in animals, especially in domestic animals. Of the 9 RV species, rotavirus A, B, and C (RVA, RVB, and RVC, respectively) had been established as important causes of diarrhea in pigs. The Minnesota Veterinary Diagnostic Laboratory receives swine stool samples from North America to determine the etiologic agents of disease. Between November 2009 and October 2011, 7,508 samples from pigs with diarrhea were submitted to determine if enteric pathogens, including RV, were present in the samples. All samples were tested for RVA, RVB, and RVC by real time RT-PCR. The majority of the samples (82%) were positive for RVA, RVB, and/or RVC. To better understand the risk factors associated with RV infections in swine diagnostic samples, three-level mixed-effects logistic regression models (3L-MLMs) were used to estimate associations among RV species, age, and geographical variability within the major swine production regions in North America. The conditional odds ratios (cORs) for RVA and RVB detection were lower for 1–3 day old pigs when compared to any other age group. However, the cOR of RVC detection in 1–3 day old pigs was significantly higher (p < 0.001) than pigs in the 4–20 days old and >55 day old age groups. Furthermore, pigs in the 21–55 day old age group had statistically higher cORs of RV co-detection compared to 1–3 day old pigs (p < 0.001). The 3L-MLMs indicated that RV status was more similar within states than among states or within each region. Our results indicated that 3L-MLMs are a powerful and adaptable tool to handle and analyze large-hierarchical datasets. In addition, our results indicated that, overall, swine RV epidemiology is complex, and RV species are associated with different age groups and vary by regions in North America.


Introduction
Rotaviruses (RVs) belong to the family Reoviridae and contain 11 segments of double stranded RNA (dsRNA) [1,2]. RVs are classified into nine species A-I (RVA-RVI) based on sequencing of the viral protein 6 (VP6) [1,3,4]. RVs are a major cause of diarrhea in pigs, and five (RVA-RVC, RVE, and RVH) out of the nine species have been found in swine [5].
RVA is considered the most prevalent, pathogenic, and the major cause of diarrhea in pigs [6]. Early studies indicated that 53% of suckling piglets and 44% of weaned pigs were infected with RVA without evidence of any viral shedding after 2 months of age. In addition, sows infected with RVA were able to shed many different viral strains [7][8][9][10]. While the pathogenesis of RVB was established in the 1980s, the revelation of RVB as an important enteric pathogen in pigs was only recently discovered in the United States of America (USA) [11,12]. RVC were first identified in swine and is an important cause of diarrhea in piglets in the USA [5,13,14]. The pathogenesis of swine RVE was established in gnotobiotic pigs although its complete characterization as a RV species is unknown [13]. While the pathogenesis associated with swine RVH is undefined, swine RVH was first identified in Japan and has been recently found circulating in USA and Brazil [15][16][17]. Co-infections of RVA, RVB, and RVC are common in nursery piglets from the USA while a limited number of co-infections for RVA and RVC have been investigated in other countries [6,18,19]. In addition, multiple RV infections can occur within a single swine herd [20], and clinical signs may vary between herds due to strain diversity and/or virulence [21].
Multilevel modeling has been widely used for statistical analysis for more than 50 years [22]. Multilevel modeling incorporates hierarchically demographic information (level) into a single analysis and provides more accurate estimates of effects than conventional fixed-effects modeling. In addition, multilevel modeling allows for multiple comparisons within each level by accounting for the variability within each level [23]. In veterinary epidemiology, multilevel modeling has been used in numerous research investigations involving studies of risk factors for, diarrhea in lambs [24], pre-weaning mortality in goats [25], gastrointestinal diseases in mink [26], Salmonellosis in poultry [27], effects of ketosis on milk production and reproductive problems in dairy cows [28,29], mortality in sows [30], weaned-to-service interval related to seasonal changes in female pigs [31], and deaths related to seasonal changes in peripartum pigs [32].
The Minnesota Veterinary Diagnostic Laboratory (MNVDL) at the University of Minnesota College of Veterinary Medicine is a large-scale diagnostic laboratory and receives swine samples from North America to identify RV infections. These samples include hierarchical data, which allows for multilevel modeling to estimate the association between RV detection and demographic traits (age, state, region, and country). Currently, three major swine production regions in the USA: Midwest, Southeast, and South-central [33,34]. Historically, most swine production systems in the USA were located in the Midwest. After the 1980s, swine populations increased in the Southeast (North Carolina and South Carolina) and the South-central (Oklahoma and Texas) regions, and weaned pigs (21 days of age) are transported to the Midwest and raised until their ready for harvest (5-6 months) since the Midwest is the major producer of the feed supply, corn [34].
In the USA, the associations (odd ratios) for swine RVA, RVB, RVC infections are lacking for different pig age groups as well as the relationship of these infections among the different production regions. Therefore, the objective of this study was to investigate the associations among age, RV detection, and regions within the US swine production in samples submitted for diagnosis to the MNVDL.

Ethic statement
The MNVDL receives animal samples voluntarily submitted by veterinarians or producers in order to determine the causative agent of disease. The MNVDL was not involved in the collection or sampling of the pigs in this study. The MNVDL retains ownership of the samples upon arrival and maintains client(s) confidentiality in public communications by removing any signifiers that would identify the client(s). Client consent is not required if the aforementioned conditions are met.

Samples and RV detection
The MNVDL received 7,508 swine samples between November 2009 and October 2011 to determine the etiological agent of disease from North America continent. Samples were tested by real time reverse transcriptase polymerase chain reaction (RRT-PCR) for swine RVA, RVB, and RVC using methods described elsewhere [5,6] [35]. The Midwest, South-central, and Southeast represent the densities regions of swine production.

Statistical analysis and model selection
The RV infectious statuses: RVA, RVB, RVC, RVAB (A+B), RVAC (A+C), RVBC (B+C), and RVABC (A+B+C) were defined for each sample as binary outcome (positive or negative). Tabular methods were performed to calculate and map the frequency distribution of RV status by age group and region with the R packages maps [36], maptools [37], RColorBrewer [38] and classInt [39]. In addition, percentages of RVA, RVB, and RVC were calculated to investigate seasonality of RV infections using a Locally Weighted Regression model [40]. Graphics were produced with the R package ggplot2 [41]. Statistical analyses were performed the R version 3.2.2 with different packages as aforementioned [42].
Age, as a continuous independent (predictor) variable, was checked for Normality using Kolmogorov-Smirnov test. To avoid linearity assumption of the independent variable, age was categorized into four main groups; (Age1: 1-3 days, Age2: 4-20 days, Age3: 21-55 days, and Age4: >55 days) (S1 Table). If age was missing with the sample, it was categorized as an Age9: "unknown age group". The crude associations between RV status and age groups (Model 1) or regions (Model 2) were measured using univariate logistic regression models, and the 1-3 day age group and non-USA region were used as reference group for each model, respectively. Each logistic regression model assumed that each observation (Y 1 ,. . .,Y 7508 ; Y j ) was independent of positive or negative result. Model 1 assumes Y i~B ernoulli (π1); Logit (π1| Age1 i , Age2 i , Age3 i , Age4 i , Age-unknown i ) = β 1 +β 2 Age2 i +β 3 Age3 i +β 4 Age4 i +β 5 Age-unknown i . Model 2 assumes Y i~B ernoulli (π2); Logit (π2|Midwest i , South-central i , Southeast i , Other USA-states i , non-USA i ,) = β 1 +β 2 Midwest i +β 3 South-central i +β 4 Southeast i +β 5 non-USA i . The β 1 is intercept term while β 2 , β 3 , β 4 and β 5 are logistic regression coefficients, and π1 and π2 are the probability of being RV positive in the Models 1 and 2, respectively. Because the overall p-value was <0.05 in the univariate logistic regression models, all the age and region covariates were included in the mixed-effects logistic regression models.
Due to the hierarchical structure of the data, three-levels mixed-effects logistic regression models (3L-MLMs, Models 3-5) were performed to investigate the association of age groups on RV detection, and the unknown-age group was excluded from the 3L-MLMs analyses. Because the objective of our study was to investigate associations of age, age was a fixed effect while individual samples (i), states (j), and region (k) were random effects [43][44][45]. The 3L-MLMs assumed that observations (Y 111 ,. . .,Y ijk ) are independent of positive or negative result, and Y ijk~B ernoulli (π ijk ), where π ijk are probabilities of positive results for individual samples (i), states (j), and region (k); hence; Logit (π| Age1 i , Age2 i , Age3 i , Age4 i , Region k , U k , W jk ) = β 1 +β 2 Age2 ijk + β 3 Age3 ijk +β 4 Age4 ijk + β 5k Region k + β 2k Age2 ijk :Region k + β 3k Age3 ijk : Region k + β 4k Age4 ijk :Region k +U k +W jk . The grand mean (β 1 ) is the intercept term. The β 2 , β 3 , and β 4 are the fixed-effects logistic regression coefficients corresponding to the three age groups while β 5k are the fixed-effect coefficients at the regional levels (region was assigned to both fixed and random effects in the 3L-MLMs). The β 2k β 3k , and β 4k are the fixed-effect coefficients for interactions between the three age groups for the regional levels. The random intercepts U k and W jk were assumed independent across regions (k) and across states (j) within-the same region (k). The i = 1,. . .I j are the level 1 indicator for the individual samples (i), j = 1,. . .J k are the level 2 indicator for the states (j), and k = 1. . ., K are the level 3 indicator for the region (k) (K = 5, J 1 = 12, J 2 = 2, J 3 = 2, J 4 = 13, J 5 = 5). Model 3 (the full model) include interaction term between age and region while Model 4 exclude the interaction term. Finally, Model 5 excluded region as a fixed component. The model with the lowest pseudo-Akaike Information Criterion (pseudo-AIC) was preferred as the final model (Model 5). The random effects for regions and states were tested using Likelihood Ratio χ 2 test (LR χ 2 ), which were obtained from residual log-Pseudo-Likelihood using the COVTEST function, and the conditional odds (cOR) of RV detection by age groups were compared to the predefined baseline group (1-3 days old).
Univariate analyses were performed with SAS 9.4, and the 3L-MLMs analyses were performed with PROC GLIMMIX (SAS Institute Inc. Cary, NC). The associations were considered significant when the p-value < 0.05.

Results
In this study, 6158 of the 7508 diarrheic swine stool samples (82.0%) tested positive for RVs. The percentage of positive RV samples from the USA, Canada, and Mexico was 81.1% (6072/ 7399), 79.9% (63/79), and 73.3% (20/30), respectively. Of the 6072 USA samples, 3638 (59.9%) were from Minnesota, and 2941 (81%) of these samples were positive for RVs (Fig 1). The percentage of RV positive samples by states ranged from 5.1% to 100.0%, with the lowest and highest percentages found in Florida and Utah, respectively. The percentage of RVA positive samples was higher than RVB and RVC in 14 states (Fig 2). Michigan was the only state to have higher percentage of RVB detection while seven states had higher percentage of RVC detection than RVA and RVB. Interestingly, RVA and RVC positive samples occurred in the same percentage (59%) in Arizona. Co-infections of RVAC were detected in the highest percentage in 12 states (Fig 3). Co-infections of RVABC were detected in the highest percentage in seven states while co-infections of RVAB or RVBC were not dominant in our data set.
Samples were categorized into 5 regions; Midwest (n = 5590), South-central (n = 754), Southeast (n = 288), other USA (n = 767), and non-USA region (n = 109) ( Table 1). While the highest proportion of RVA positive samples (70.1%) was found in the Southeast region, the  To investigate the seasonality of RVA, RVB and RVC detection among swine diagnostic samples, the percentage of positive samples (observed and expected) were plotted over time (Fig 4). Overall, the expected percentage of positive samples was higher for RVA than for RVC and RVB. However, between April and July 2011, the expected percentage of RVA and RVC overlapped. In addition, the expected percentage of RVA detections decreased from 72% to 57% during the study period while the expected percentage of RVB positive samples increased  from 33% to 40%. Nevertheless, the percentage of RVC detections remained relative stable (53% to 51%) throughout the study. Since age as a continuous variable was not normally distributed (p-value <0.01), samples were categorized into five different age groups. Univariate logistics regression models were employed to determine if age and regions (Model 1 and 2, respectively) were risk factors for RV detection, and the RVA, RVB, RVC, RVAB, RVAC, and RVABC crude odds ratios (OR) were estimated (S2 Table). In Models 1 and 2, both age and regions were significant (p-value<0.001, except for RVAB in model 2, p-value = 0.042), indicating both variables should be in the same model. Thus, the 3L-MLMs (Models 3-5) were employed to partition and understand the variability of RV detection ( Table 2). The 3L-MLMs contained a variation of model specifications, including fixed effects of region, random effects of age and regions, and the interaction between age and regions. Model 5, with age as fixed effect and random region and state effects as random effect, had the lowest Pseudo-AIC, indicating it was the preferred final model, and the COVTEST function verified the final model (p-value <0.001).
For the final model, the conditional odds ratios (cORs) of being positive to RV infection by age groups after adjusting for sample source variation were calculated ( Table 3). The cORs for RVA and RVB were lower for pigs in the 1-3 days old compared to the other age groups. However, the odd of RVC detection in the 1-3 day age group was higher than in the 4-20 and the >55 day age groups (p<0.001). Furthermore, pigs in the 21-55 day age group had an increased odds for RV co-detections compared to pigs in the 1-3 day age group. The random effects for the region (γ 2 ) and the states (τ 2 ) were estimated from the assumptions of the final model to calculate the intra-class correlation coefficient of ρ(region) and ρ(States, region) to compare the variability within region and among states. The intra-class correlation coefficient of ρ(States, region) was greater than ρ(region) for RV detection, implying RV status was more similar within states than among states or within regions.

Discussion
To better understand the epidemiology of RV infection in pigs with enteric disease, 3L-MLMs were developed to estimate the association between RV detection by RRT-PCR and age in  (Models 1 and 2) and Pseudo-AIC for three-level mixed-effects models (Models 3-5). 2 Final model is indicated by lowest Pseudo-AIC.
NA means the models did not converge.
doi:10.1371/journal.pone.0154734.t002  veterinary diagnostic samples. The effect of geographical location was incorporated to adjust the associations (conditional odds ratios) of RV detection among age groups. The detection of the different RV species was not evenly distributed within age groups or geographical regions. Understanding the distribution of RV infection among swine populations is important to develop better intervention practices to minimize the effect of RV infections on swine health.
The ecology of RV infections is complex, which has been demonstrated in different animal species [48]. Our results support those findings and indicate that the epidemiology of enteric diseases among swine populations is difficult due to the co-circulation of more than one RV species. While RV co-infections are common and complex, the ecology of RVA, RVB and RVC are different since they are not evenly distributed among pig age groups. In our study, older pigs (4-20, 21-55, and >55 days age groups) had higher cOR for RVA detection compared to piglets in the 1-3 day age group. Moreover, the cOR for RVA detection decreased in the > 55 day age group, which might be contributed to RV exposure and an increase of active immunity in the 21-55 day age group. Compared to RVA studies, RVC research is extremely limited. In our study, a higher proportion of RVC positive samples were present in 1-3 day old piglets, highlighting the significance of RVC on neonatal enteric disease. We hypothesize that the variability of RVC exposure in sows is correlated to the lack passive immunity protection, leaving the 1-3 day old piglets susceptible to a RVC infection. While sows are exposed to RVC via naturally planned exposure events (i.e. feeding RVC infected material to the sows) before farrowing, swine producers still report problems in preventing clinical disease associated with RVC infections in piglets. Hence, further studies are required to understand the development of maternal immunity to RVC, and its effect on preventing infection in piglets.
Multilevel mixed-effects logistic regression models are designed to handle hierarchical structure data sets with binary outcome for a dependent variable and independent variables Multilevel mixed-effects logistic regression models are very versatile and powerful, especially with large data set because inaccurate estimates may be generated if the hierarchical structure (multiple-demographic information) and source of variability is ignored [23]. Fixed-effect logistic regression models for states and regions increase the number of additional parameters, which is equal to the number of higher-level units minus 1 (j-1 for state levels and k-1 for region levels). If the number of parameters (states) is large, estimating the number of nuisance parameters is difficult, which may yield poor estimates [49]. In our data set, state (α ij = α+ α i +-W ij ) and region (α i = α+U i ) effects were treated as random intercepts with specified probabilistic distribution, and the nuisance parameters were not estimated because the analysis provided conserve estimates for the state and region effects [49]. Unsurprisingly, our model indicated variability in RV detection among states. While RV detection was similar to within-regions but not similar among regions, different swine densities in North America may lead to less variability within each region. Furthermore, swine production systems could differ between regions, which may explain the differences among regions. In addition, wind, humidity and temperature vary by states and may affect RV infections in each swine regions, which were considered part of the regional level random effect. Under experimental settings, RV particles were aerosolized, which could be transported between farms by the wind [50]. In addition, transportation of pigs between states was also considered as a regional level random effect in our multilevel mixed-effect logistic regression models. Farm management and production systems (all-in all-out vs. continuous flow swine productions systems) can affect dynamics of viral transmission, infection, and evolution. Dewey and colleagues demonstrated farm management practices, including farm expansion, early weaning, and all-in all-out production affect the dynamics of RV infections [51]. Since the farm management information is lacking with sample submission, these factors were deemed as states level random effects (L2 random intercept) to encompass variations between farm management and production systems.
In summary, RV infections are a significant cause of diarrhea in swine. Determining the RV species associated with clinical disease and estimating the risk of RV infection over time will lead to better intervention tools to minimize the effect on swine health. Due to the large geography and different swine production regions within North America, 3L MLMs were used to adjust for variability in states and regions, and indicated RV status was more similar within states than between states or within region. Piglets in the 1-3 day old age group were less risky to RVA and RVB infection but more risky to RVC infection while associations in the older age group piglets were reversed. Our research indicates the swine RV epidemiology is complex in North America, but one thing is known, RV species are associated with different age groups and varied by regions.
Supporting Information S1