Time series analysis of bovine venereal diseases in La Pampa, Argentina

The venereal diseases bovine trichomoniasis (BT) and bovine genital campylobacteriosis (BGC) cause economic losses in endemic areas like La Pampa province in Argentina where beef cattle are usually extensively managed. This study used data compiled between 2007 and 2014 by a Provincial Program for the Control and Eradication of venereal diseases in order to develop and analyze retrospective models of time series for BT and BGC. Seasonality and long-term trend were explored with decomposition and simple regression methods. Autoregressive Integrated Moving Average models (ARIMA) were used to fit univariate models for the prevalence and persistence of BT and BGC. Autoregressive Integrated Moving Average with Explanatory Variable models (ARIMAX) were used to analyze the association between different time series, replacement entries and herd samplings. The prevalence and persistence of BT and BGC have decreased from 2007 to 2014. All the BT and BGC time series are seasonal and their long-term trend is decreasing. Seasonality of BT and BGC is similar, with higher rates of detection in autumn-winter than is spring-summer. Prevalence and persistence time series are correlated, indicating their changes are synchronic and follow a similar time pattern. Prevalence of BT and BGC showed the best fitting with the ARIMA (0,0,1)(0,1,1)12 model. While for persistence of BT and BGC, the best adjustment was with the same model with no seasonal difference where the current number of cases depends on the moving averages of the month and the previous season. Including covariates improve the fitting of univariate models, in addition, estimations using ARIMAX models are more precise than using ARIMA models. The time distribution of the samplings could be increasing the false negative ratio. According to the obtained results, the ARIMA and ARIMAX models can be considered an option to predict the BT and BGC prevalence and persistence in La Pampa (Argentina).


Study area and population
The study area was the province of La Pampa in Argentina, which included around 6% of the total cattle population of Argentina [11]. La Pampa is located in the geographical center of Argentina and covers an area of 143,440 km 2 ; approximately 5.2% of Argentina. Cattle production in La Pampa is typically extensive and involves two main production systems: herds that produce calves for fattening establishments (breeding herds), and herds where breeding, rearing and fattening are carried out on the same premises (full-cycle herds).
The study population consist of all herds tested (from 2,000 to 6,000) under the Control and Eradication Program (PCE) from 2007 January 1 st to 2014 December 31 st (Fig 1). La Pampa regulation requires the testing of all non-virgin bulls existing in a herd for BT and BGC prior to interprovincial or intraprovincial movement of any type of cattle (breeding bulls, nonbreeding bulls, cows, calves) to another herd, feedlot or slaughterhouses [12]. Therefore, the study population corresponds to all herds existing in La Pampa between 2007 and 2014, except those few herds without animal movements during this period.
All non-virgin bulls in La Pampa are tested twice a year as part of PCE. The methodology of sampling and diagnosis has been thoroughly described by Molina et al. [21]. A bull was classified as negative if all results in two consecutive tests were negative, and positive if at least one test yielded positive results [22]. Herds with at least one positive bull were classified as positive.

Data
The present study used the data compiled and reported by PCE, and added on a monthly basis from 1 st January 2007 to 31 st December 2014. Prevalence and persistence of BT and BGC was analyzed. Prevalence is defined as the proportion of positive herds over the total tested herds. Persistence is defined as the proportion of positive herds in year n that were also positive in year n-1 over the total tested herds. The following variables were evaluated as possible risk factors, also added monthly: tested herds, tested herds persistent to BT, tested herds persistent to BGC, entry of breeding bulls for replacement, entry of breeding cows for replacement, and entry of heifers for replacement.

Decomposition methods
Decomposition methods are used to obtain seasonality and long-term trend that characterize the behavior of the time series: Seasonality of the time series is expressed using seasonal indices, calculated as the monthly average of the ratio between the monthly value and the annual monthly average value [23]. The seasonal index varies between 0 and 1 when the monthly value is lower than the medium level, and it is higher than 1 in the opposite case. Spearman's rank correlation coefficients were calculated in order to analyze the seasonal relationship between time series.
The seasonality of time series is removed dividing each value of the series by its corresponding seasonal index, obtaining the non-seasonal time series. The relationship between the nonseasonal time series (dependent variable) and time (independent variable) was modeled using simple linear least square regression in order to obtain the long-term trend [14]. Different alternative models were built and evaluated. The fitting of the model was determined by the coefficient of determination, (R 2 ) and it was contrasted performing analysis of variance (ANOVA) [24].

ARIMA model
ARIMA model is a univariate analysis procedure widely used to model the occurrence of infectious diseases. ARIMA model results from a differencing process based on autoregressive moving average model (ARMA), which expresses the actual value as a linear combination of the previous value (autoregressive component) and the residual series (moving average) [25]. ARMA model can be expressed as: where y(t) denotes the value of the series in the time t; ε(t) the residual through time t; φ and θ are its corresponding coefficients. ARIMA model is expressed as ARIMA (p, d, q) x (P, D, Q)s, where p, d and q are nonnegative integers referring to the order of the autoregressive, integrated and moving average parts of the model respectively; whereas P, D and Q represent the order of the seasonal autoregressive, differencing and moving average respectively. The subscripted letter "s" denotes the seasonal period length.
The ARIMA modeling procedure was introduced by Box and Jenkins, and consists of three iterative steps: identification, estimation and verification of diagnosis [26]. Before the identification step, data must be stationary, which can be achieved by performing an appropriate seasonal difference in addition to the regular difference of the ARIMA model. Stationarity can be tested using the method developed by Dickey-Fuller [27]. The identification step includes the process of determining seasonal and non-seasonal orders using the autocorrelation and partial autocorrelation functions of the differenced series [18,28]. Parameters in the model are estimated using the method of conditional least squares after the identification step [29]. Finally, the adequacy of the established model for the series is verified by performing white noise test to check if the residuals are independent and normally distributed [30]. It is possible that several ARIMA models can be identified, so it is required to choose an optimum model. This optimum model is generally determined based on the Akaike information criterion (AIC) and on Schwartz Bayesian criterion (SBC) [31].

ARIMAX model
In contrast with the ARIMA model, where the previous values of the dependent variable (AR) and residual series (MA) are taken after the differencing process, the ARIMAX model is an extension of ARIMA modeling incorporating an independent explanatory variable [20]. ARI-MAX model can be expressed as follows: yðtÞ ¼ b x ðtÞ þ φ y ðt À 1Þ þ Á Á Á þ φ p yðt À pÞ þ εðtÞ À y 1 ðt À 1Þ À Á Á Á À y q εðt À qÞ ð3Þ where x(t) denotes a covariate at time t; β denotes its associate coefficient; y(t − 1). . .y(t − p) denotes the previous values, and ε(t). . .ε(t − q) are the residual series. For each time series, all the rest were used as covariate. Non-significant covariates (p>0.05) were removed from the model. As well as in ARIMA, several ARIMAX models can be identified. The selection of the optimum ARIMAX model is also based on AIC and SBC.
Before fitting the ARIMAX model, cross-correlations between the different analyzed time series were calculated [32]. The ARIMAX model hypothesizes that the time series y(t) is related to past lags of the x(t + m) series. The cross-correlations can help to identify lags of the x-variable that could be used as predictors of variable y(t). The value of cross-correlation is between -1 and +1. Absolute values closer to 1 indicate a higher correlation between both series.
ARIMA and ARIMAX models were fitted to the series of prevalence and persistence of BT and BGC from 2007 to 2013 and tested by estimating prevalence and persistence for 2014. All statistical analyses were performed with a signification level of alpha 0.05 using the software SPSS version 15.0.

Results
Seasonal indices were obtained from the original time series and reveal the seasonality of the testing of herds, replacement entries, BT and BGC prevalence and persistence (Table 1). Correlations between the seasonal indices are shown in Table 2.
The seasonality of the testing of the herds and the testing of BT and BGC persistent herds is considerably high, and it is concentrated from August to October, with a seasonal peak in September. The three-time series follow a similar seasonal pattern with correlation coefficients over 0.90. The seasonality of the entry of breeding bulls for replacement is also quite pronounced and it is concentrated from August to November, with a seasonal peak in October. Bulls entered tend to coincide with the testing of the herds, with correlation coefficients between 0.60 and 0.86. The entry of breeding cows for replacement is less seasonal than for breeding bulls and is concentrated from April to August, while the entry of heifers for replacement is slightly seasonal.
The prevalence of BT and BGC follows a similar seasonal pattern, with a correlation coefficient of 0.95. Both diseases have a higher prevalence in autumn to winter (May-July) than in spring to summer (October-December), with a seasonal peak in May. No more significant correlations were identified.
Persistent cases of BT and BGC are more frequent in autumn-winter than in springsummer, with a seasonal peak in April. Both series follow a similar seasonal pattern, with a correlation coefficient of 0.86. Persistence of BT and BGC tends to coincide with the entry of breeding cows for replacement, while the entry of breeding bulls for replacement shows a negative seasonal trend with the persistence of BT.
The long-term trend for each disease is obtained through simple regression models for the time series removing seasonality. The model, the estimation of the coefficient and the adjusted coefficient of determination (R 2 ), are shown in Table 3. The best fitted model was BGC persistence (R 2 = 79.56), while BT persistence model did not show good fitting (R 2 = 28.83). Prevalence models showed an acceptable and intermediate adjustment. The trends of persistence and prevalence of BT and BGC are decreasing. Prevalence of BGC decreases slower than prevalence of BT, while persistence of BGC decreases faster than persistence of BT.
ARIMA models were established for the four-time series from 2007 to 2013. Models were tested by estimating values for the year 2014. The orders of the autoregressive process and moving average were determined from the autocorrelation and partial autocorrelation functions [25]. Several ARIMA models were adjusted. Results from the estimations are displayed in Table 4. The ARIMA model finally selected for each series is highlighted and corresponds to that which minimizes the Akaike Information Criterion (AIC) and the Schwartz Bayesian criterion (SBC). Prevalence of BT and BGC present the same ARIMA pattern. The model that better adjusted both series is ARIMA (0,0,1)x(0,1,1) 12 . This implies that after a seasonal differentiation, time t prevalence is relevant for the residual at times t, t-1, t-12 and t-13. Persistence of BT and BGC also present the same ARIMA pattern, although with no seasonal differentiation.
Cross correlations between different time series were calculated. The highest correlation coefficients and lags are shown in Table 5. The highest values appear when lags are 0 except for the entrance of breeding bulls, which shows a strong positive correlation with testing of herds series in lag -1. Testing of herds and testing of persistent BT and BGC herds show high and

Time series
Model positive correlations with each other. The prevalences of BT and BGC are correlated, and the persistence and prevalence of each disease are also correlated, although poorly. Results from ARIMAX models for the four-time series are presented in Table 6. The best ARIMAX model is highlighted in Table 6 according to the AIC and SBC criteria. ARIMAX models obtained smaller AIC and SBC than their corresponding univariate ARIMA models, which indicates that the fitting performance improved with the inclusion of the covariate variables. The fitting and prediction of prevalence and persistence of BT and BGC for the ARIMA and ARIMAX models are plotted in Figs 2, 3, 4 and 5.
Root mean squared error (RMSE) between the observed value of the raw series and the estimated values obtained through the ARIMA and ARIMAX model were compared (Table 7). All RMSE for ARIMAX model were lower than the ARIMA model in the modeling process and point estimation.

Discussion
Data collected in La Pampa (Argentina) over 8 years was used to assess the viability of developing time series models using monthly changes in BT and BGC in the province. The rates of prevalence of BT and BGC peaked in March 2008 and February 2007, respectively. Since then they have decreased, demonstrating effective control measures. The effort of PCE to control both diseases has also been able to decrease persistent cases of BT and BGC. The initial prevalence of both diseases was generally lower than those reported in other endemic areas in Asia, Australia, North America, South America and South Africa [4,9,33,34,35,36,37,38,39,40,41,42,43]. A prevalence of 37% for BGC was reported in Uruguay, while no herds were found to be positive to BT [44]. In Buenos Aires (Argentina) a prevalence of 1.5% was reported for   [45]. In the east of La Pampa, a prevalence of 11.1% was found for BT and of 7.0% for BGC [46]. Most of the reported prevalences come from studies conducted with few herds and limited conditions, so it is possible that they do not reflect accurately the situation [7]. For instance, in South Africa, the occurrence of BGC seems to be vastly underestimated [6]. ARIMA models were used to fit univariate time series of prevalence and persistence of BT and BGC. Models have proved to be appropriate to fit the historic prevalence and persistence, and to estimate prevalence and persistence in 2014. All-time series followed the same pattern,  where the present number of cases depends mainly on the moving averages of the previous month and season. ARIMAX models were also fitted to explore correlation between time series of both diseases and potential co-variation factors. The inclusion of covariates improved the adjustment of univariate models. In addition, estimations in ARIMAX models are more accurate than in ARIMA models. ARIMAX models add predictive value in the control of both diseases. Thus, prevalence not only can be estimated using historical data, but also using persistence of the same disease and prevalence of the other one.
Adjusted models have properly captured the trend of prevalence and persistence of BT and BGC, although they predict future occurrence with medium precision. The relatively high RMSEs are due to the small number of cases, in both absolute and relative terms compared to the population monthly tested [20,47]. On the other hand, factors related to herd management were not considered in the models, which can play an important role in exploring the time series relationship among both diseases [36,48]. These data are not currently available, although they could be included in future studies.
ARIMA and ARIMAX models have been typically used in econometrics. However, their use is increasing in the medical field [16,17,18]. Zhang et al. [14] used them to model the background and future incidence of syphilis in China, evidencing a higher performance when different types of syphilis were included as covariates. Soebiyanto et al. [17] modelled the transmission of influenza and obtained a high predictive capacity including climatic factors.  Wu et al. [49] modeled the occurrence of dengue in Taiwan and also obtained better results when including climate variables as covariates.
The main advantage of these models is that they consider seasonal differences, which can be useful to predict sexually transmitted diseases [14]. However, their implementation requires appropriate datasets, as those compiled by epidemiological surveillance systems, which are currently scarce for livestock venereal diseases [7,15]. This could explain that, although several papers that apply time series models in veterinary epidemiology have been recently published (i.e. sylvatic rabies, Hussain et al. [50]; bovine fasciolosis, Silva et al. [51]; Newcastle disease, Mubamba et al. [52]), we have not found any studies analyzing time series of livestock venereal diseases.
Cross correlation between time series of both diseases is high when lag is zero, what indicates that changes are synchronic. Correlations between seasonal indices also indicate that BT and BGC follow a similar seasonal pattern that is generally defined by a higher frequency in autumn-winter than in spring-summer. These results suggest that BT and BGC are bound to similar temporal factors. However, the evaluated co-variation factors related to testing of herds and the entrance of replacement cattle have not been significant predictors in any ARI-MAX model. In addition, no strong or significant correlations have been evidenced with any of the different time series of both diseases.
There are no evident biological or transmission explanations for a lag of 0 between persistence and prevalence of the same disease. It is probably explained by the contribution of the persistent cases to the monthly prevalence, particularly relevant in the early years of the study. This means that we use the same time persistence to estimate the prevalences, potentially decreasing the predictive benefit. In ARIMAX, the covariates can be a burden of modeling if the cross-correlations are not high enough [14]. This is a limitation of the method.
Seasonality of BT and BGC could be related to the reproductive cycle of herds. Although in most of herds the reproduction is handled with a continuous season, mating tends to be concentrated in spring-summer, due to a better forage offer that stimulates the beginning of the estrous cycle. On the other hand, in autumn-winter bulls are usually at reproductive rest. Szonyi et al. [53] found in Texas (USA) a higher prevalence of BT in summer than in winter, which they related to a higher bad quality bull testing frequency in summer. Reproductive inactivity increases the concentration of microorganisms in the preputial fluid and, therefore, facilitates detection of both diseases [21,54]. Consequentially, detection rates fluctuate seasonally with the reproductive cycle, being higher during periods of inactivity.
The reproductive cycle also constitutes a temporary factor that should be considered when planning the testing of herds. In La Pampa, the testing of the herds should be concentrated in autumn-winter in order to reduce the risk of false negatives generated by low concentration of microorganisms in the preputial fluid [21,54]. However, herds are generally tested in spring, when bulls are generally active. Thus, there is a high risk of false negatives generated by this factor. Consequently, the prevalence of both diseases should be higher than reported by PCE; probably close to the prevalence obtained during autumn-winter months.
Another factor that should be taken into account is the entrance of breeding bulls for replacement into herds. Herds should be tested after the entrance of the bulls, even if they were certified as free of venereal diseases, since the risk of false negative due to a low concentration of microorganisms is high. However, the entrance of bulls and the testing of herds occur with a lag of one month. This suggests that tests tend to take place a month before the entrance of breeding bulls into the herds; with no effective knowledge of their sanitary status. In agreement with Ondrak [55], this factor is relevant to explain the spread of both diseases in free herds.
The identified temporal and seasonal patterns should be considered together with risk and handling factors in order to improve the PCE success. Compiled data from PCE previously allowed the identification of spatial patterns of BT and BGC in the province, revealing spatial correlation between the risk of BT and BGC [21]. Other studies have shown that BT and BGC share some of their main risk factors [9,10,36,56,57]. Thus, the development of integrated actions focused on the common features of BT and BGC should improve effectiveness and efficiency of the intervention measures [57].
The identified long-term trends suggest that persistent cases of both diseases are drastically reduced, while the prevalence of BT and BGC has been decreased to reach minimum levels, difficult to reduce further considering the actual conditions of PCE. These results are similar to those achieved by other control plans based on policies of "testing and culling" for bovine venereal diseases. For instance, in Wyoming (USA), a control plan similar to PCE managed to reduce the herd-level prevalence of BT to 1.29% in nine years [58]. In the breed Asturiana de la Montaña (Spain), a voluntary control test plan using the "test and cull" approach managed to reduce herd-level prevalence of BT from 41% to 19.7% in two years [59]. These results suggest that, in agreement with Yao [54] and Collantes-Fernandez et al. [59] diagnostic and slaughter schemes are effective in reducing the occurrence rates. However, the complete elimination of BT and BGC without substantial changes in management appears unlikely, because putative risk factors associated with both diseases are present in the management of farms.
Currently, the sampling schedule is a result of individual decisions of each farmer, who establishes in which moment the herd is sampled. This way of planning is the optimum when the farmer applies productive criteria, since from a sanitary perspective the best moment for test the herd coincides with the best moment from a zootechnical point of view. From a sanitary perspective, the testing of herds should take place during the non-reproductive season, and at least a month after the entrance of breeding cattle into the herd [53,54,55]. This sampling system minimizes the probability of obtaining false negatives and the transmission of both pathogens to free herds. From a zootechnical point of view, this system allows the detection of both diseases before they can cause reproductive failures, which also allows the replacement with free bulls without generating substantial delays in the reproductive cycle.
However, most farmers do not take this approach. Sampling schedule probably responds to the need of testing the herd before the authorization of cattle movement, which in La Pampa is compulsory for any destination, even to slaughterhouse. An important reason for this could be that farmers underestimate the scale of the problem and its impact on the farm economy. In fact, Jiménez et al. [10] showed that the status of BGC is not a sensitive issue for Argentinean farmers. There are currently no global governmental monitoring programs to track the incidence and prevalence of bovine venereal diseases [7]. In general, there are no regulations requiring testing for bulls sold by private treaty or those resident in privately owned herds [55]. Both diseases are typically asymptomatic or subclinical, so it is difficult to take notice of their severity until the occurrence of reproductive failures. In addition, practices such as keeping a reproductive activity record, daily inspection of the herd, pregnancy diagnosis or culling of non-pregnancy cows are not commonly applied, which reduces the possibility of identifying precociously reproductive problems [9,10]. Jointly, all these factors make the global livestock industry underestimate the adverse effects of bovine venereal diseases.
Persistent herds have been dramatically reduced, which can be explained by an increased awareness of the severity of the problem after suffering the disease. Rae et al. [39] reported that farmers familiarized with BT diminished the risk of disease. In Spain, infection by T. foetus has been recently found to increase calving intervals by 79 days and reduce income by 68.7% in herds, while implementation of a voluntary testing and culling policy was effective in improving reproductive efficiency [59]. More than 50% of reproductive failure in Argentina has been estimated to be caused by venereal diseases, although the arising productive and economic consequences have never been formally assessed [60]. In this sense, additional research could be useful to understand the economic and productive consequences of both diseases in the region, as well as to provide evidence of the effectiveness of the intervention measures. On the other hand, the managing authority could plan the sampling jointly with the farmers, at least temporarily, in order to correct the deviations identified in the present study.
In addition to the sampling schedule other factors should be considered to improve the rates of detection and to reduce the risk of false negatives. On one side, in La Pampa two consecutive negative tests are enough to declare a bull as disease-free, while the recommendation is to obtain three or four negatives [22]. On the other side, there are available diagnostic techniques based on PCR that significantly improve the results obtained by the official ones implemented by PCE, and only requiring one test per animal [61,62].

Conclusions
Time series analysis was an effective tool for modeling the historical and future prevalence of T. foetus and C. fetus infections in La Pampa (Argentina). ARIMAX models showed superior performance than ARIMA models for the modeling of BT and BGC prevalences. Results showed that both diseases are seasonal and synchronous, with detection rates in autumn-winter than in spring-summer. Our results also provide valuable data for planning control strategies and making recommendations to farmers and managing authorities. Testing and culling policy was effective in reducing prevalence and persistence of T. foetus and C. fetus. Knowing the status of the bulls before breeding is the first step for developing a sound control plan and the best form of herd surveillance. However, the complete elimination of BT and BGC without substantial changes in management appears unlikely, because putative risk factors associated with both venereal diseases are present in the management of these farms.