Epidemiological models for predicting Ross River virus in Australia: A systematic review

Ross River virus (RRV) is the most common and widespread arbovirus in Australia. Epidemiological models of RRV increase understanding of RRV transmission and help provide early warning of outbreaks to reduce incidence. However, RRV predictive models have not been systematically reviewed, analysed, and compared. The hypothesis of this systematic review was that summarising the epidemiological models applied to predict RRV disease and analysing model performance could elucidate drivers of RRV incidence and transmission patterns. We performed a systematic literature search in PubMed, EMBASE, Web of Science, Cochrane Library, and Scopus for studies of RRV using population-based data, incorporating at least one epidemiological model and analysing the association between exposures and RRV disease. Forty-three articles, all of high or medium quality, were included. Twenty-two (51.2%) used generalised linear models and 11 (25.6%) used time-series models. Climate and weather data were used in 27 (62.8%) and mosquito abundance or related data were used in 14 (32.6%) articles as model covariates. A total of 140 models were included across the articles. Rainfall (69 models, 49.3%), temperature (66, 47.1%) and tide height (45, 32.1%) were the three most commonly used exposures. Ten (23.3%) studies published data related to model performance. This review summarises current knowledge of RRV modelling and reveals a research gap in comparing predictive methods. To improve predictive accuracy, new methods for forecasting, such as non-linear mixed models and machine learning approaches, warrant investigation.


Introduction
Ross River virus, a mosquito-transmitted Alphavirus, is the most common arboviral infection of humans in Australia [1,2] and often results in a characteristic syndrome, including constitutional effects, rash, and rheumatic manifestations [1,3]. A total of 123,875 cases of RRV infection were reported from 1993 to 2019 in Australia, of which nearly half (48.8%) were from Queensland. [4] Ross River virus transmission is primarily influenced by mosquito abundance, reservoir host populations, and climatic, environmental (e.g. rainfall, temperature, tides, river flow, vegetation cover) and socio-economic factors (e.g. urban development, housing infrastructure) [1,[5][6][7][8][9]. Models of RRV using these exposures can improve knowledge of RRV transmission or be used to give early warning of outbreaks, thus aiding disease prevention and control. However, the relationships between exposures and RRV incidence are complex. For instance, climate can influence vector abundance, host populations and the behaviour of vectors and hosts, and climate and weather are influenced by human behaviour (e.g. global warming, heat island effects, effects of large dams) and geographical factors like altitude [10,11]. Therefore, exposures do not have a simple correlation with disease incidence, which increases the difficulty of forecasting.
Generalised linear regression and time-series models are widely used for infectious disease prediction [12]. Linear regression models are straightforward, but often inadequate for prediction in complex systems. Time-series models are especially suitable for analysing data containing autocorrelation and which shows periodic fluctuations [13,14]. Three reviews on exposures or predictive models of RRV have been published, however all concentrated on a description of exposures and their relationships with disease, with less attention to models and their performance, and none were systematic reviews. A review by Tong et al. (2008) [8] included more than 15 articles on predictors of RRV transmission. Analytical methods were listed, and the detailed research process and results were described to elucidate the association between climatic, social and environmental factors and RRV disease.   [15] identified research on the impact of climate change on RRV disease. All models applied in these studies were listed, but the characteristics of the models were not discussed. Another review by Jacups et al. (2008) [9] described the vectors and vertebrate reservoirs of RRV, the possible impact of climate change on incidence, and summarised the models and the climatic factors applied in 15 studies. RRV models were discussed in this study, but the focus was on the influence of covariates and the geographical size of the study areas on the model accuracies. However, these three reviews neither provide a detailed profile of the models nor quantification of their performance.
In this review, the research hypothesis is that a detailed summary of all available primary modelling research for RRV enables an evaluation of the effectiveness of the models in forecasting disease and improving knowledge of exposures and transmission cycle. We aim to describe modelling approaches applied in RRV disease prediction in Australia, the performance of these models, the variables used and the models' performance in prediction.

Methods
This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [16]. The PRISMA Checklist is available in S1 Table. The proposal for this study was completed before data extraction (S1 Text).

Literature search strategy
We performed a structured literature search using PubMed, EMBASE, Web of Science, Cochrane Library, and Scopus for articles published between January 1, 1980 and January 21, 2020 with search terms encompassing pathogen (i.e. "Ross River virus"), methods (i.e. "model" OR"forecast"), and exposures (i.e. "impact factor" OR "predictor" OR "association"). Articles not relevant to our aims were excluded (i.e. "gene" OR "protein" OR "transfusion"). These search terms efficiently excluded irrelevant records. Studies on genes or proteins were mainly laboratory-based and not relevant for epidemiological risk prediction. Studies on transfusionrelated RRV transmission were excluded because such transmission is infrequent and can be ignored. In addition, findings could not feasibly be integrated with studies of mosquito-transmitted disease. Search terms are provided in S2 Text.

Inclusion and exclusion criteria
Studies of RRV using population-based data, incorporating at least one epidemiological model and analysing the association between an exposure or exposures and RRV incidence or outbreaks, located in Australia, in English and with full-text available were included in this review. Review articles, meeting abstracts, letters, books, reports and comments were excluded. Studies on RRV virology, vaccines, and animal models were also excluded. Articles describing models of RRV vectors or non-human reservoir hosts without human epidemiological data were excluded, as were studies involving transmission dynamic modelling. The records were first screened by titles and abstracts, then the full texts were reviewed before a final decision on inclusion. Study inclusion was conducted by one author (WQ), and in cases of uncertainty, all four authors reached a decision after discussion.
For included records, title, author, publication year, research area and period, predictors and format of predicted outcome, modelling method, significant results, prediction performance, model evaluation and model validation were extracted.

Methodological quality assessment and data extraction
Included studies were assessed according to recently published criteria for observational studies [17][18][19][20]. Each criterion was scored 2, 1 or 0 if the studies fully, partly or barely met the criterion (S2 Table). The statement of funding and conflict of interest were each scored 1 if they were stated clearly. The studies were classified into three groups depending on total scores: high (19)(20)(21)(22)(23)(24), medium (13)(14)(15)(16)(17)(18), and low quality (<13). Related study registrations were searched to evaluate publication bias. Data extraction and quality assessment were conducted by one author (WQ) and discussed by all authors where there were uncertainties.
All the authors participated in the entire review process, discussed the main decisions and reached agreement on study selection, data extraction and study assessment together. Study characteristics and model performance were tabulated. Exposures applied in these models were summarised and their association with RRV listed.
The transmission cycle of RRV and key exposures influencing RRV infection are illustrated in Fig 1.

Results
After duplicates were removed from 2,227 searched records, we screened 976 papers; after exclusion criteria were applied, 43 records remained (Fig 2)  . All studies were published in the last 20 years, and 19 (44.2%) during the past decade. Nearly half the studies were conducted in Queensland (20,46.5%), while five (11.6%) were in Western Australia.
The quality scores for the studies are listed in Table 1. Detailed scores are provided in S3 Table. All studies attained high (33, 76.7%) or medium quality scores (10, 23.3%). There were two articles published without significant results. No systematic review registration related to RRV modelling was found.
Fourteen articles (32.6% of 43 articles) used mosquito abundance or related data as model covariates. Climate and weather data were used by 27 of the 43 studies (62.8%). Other exposures such as river flow, distance to surface water sources, historical RRV cases and host population were also used ( Table 5). Rainfall (applied in 69 models, 49.3% of 140 models),  Most studies (23,53.5% of 43 studies) used incidence rates or disease occurrence as dependent variables ( Table 2). Twelve studies used outbreaks as dependent variables (Table 3), while ten used notified cases (Table 4). Two articles used incidence rates and outbreaks in different models. Linear models were applied for analysing all forms of notified data. Most studies using time-series and spatial analysis models forecast incidence rather than outbreaks. All studies using CUSUM-based models predicted outbreaks.
Only ten of the 43 studies published data related to model performance; among them, five used logistic regression models, one used a Hurdle model, one CUSUM-based methods, one a CART, one a Polynomial distributed lag model and one a negative binomial regression model ( Table 6). Seven of the studies were applied to predict RRV outbreaks, one predicted incidence and two predicted cases. Most of the models achieved accuracies or overall agreements of 75.0% or higher.

Discussion
This systematic review provides a complete analysis of predictive models and exposures for predicting RRV incidence. In contrast to existing reviews which described the climatic, environmental and social factors incorporated in models, this review focuses on the modelling approaches and model performance. Most predictive models used generalised linear models and time series methods, but few studies presented model performance statistics. Many exposures have been included in these models; most of them are in one or two studies only. Rainfall and temperature are the most common exposures, and within the ranges studied, the association with RRV incidence is positive for both exposures in general. Mosquito abundance has a positive effect on RRV as expected.
Data quality was assessed in few studies. This is perhaps because data were collected from government or other public data repositories; consequently, data quality is implicitly considered to be good or the quality is difficult to assess. Some models (e.g. spatial analyses) are unable to predict disease frequency and consequently model evaluation or validation approaches cannot be applied. This systematic review identified more than 60 exposures. Climate and weather influence mosquito breeding and behaviour of hosts, and therefore change the prevalence of the disease in a complex way [64]. The lag periods for climatic exposures differ for different parts of the transmission system. Weather can accelerate or decelerate mosquito breeding over a period of several days to weeks [11,34,65,66], while humans may adjust their behaviour immediately in response to weather changes, and host population structure and consequently seroprevalence  may be affected by climate after a few years [10,67,68]. This phenomenon also explains why the same exposure can influence RRV incidence both positively and negatively at different lag times. Interactions between climatic exposures further complicate the analysis [65]. Data on vectors and reservoir host species, abundance and competence are crucial for forecasting RRV incidence [60,69,70]. The importance of vectors and reservoir hosts differs between species because of behavioural and ecological variation [71,72]. The feeding and breeding of mosquito species are affected by host availability and abundance [73,74]. Across urban, inland and coastal regions of Australia, vector and host species driving RRV transmission are diverse and variable [2]. Because of the wide variety of non-human reservoir hosts, it is extremely difficult to ascertain the complex relationships among hosts, vectors, and disease incidence. Epidemiological analyses and host ecology studies including serosurveys are important methods of detecting and describing these relationships. However, vector and reservoir host data with sufficient details and completeness to be useful for prediction are rarely available, impairing the quality of models.
Surface water sources, river flow, vegetation and remoteness, which were included only in a few studies, are promising data sources and should be explored further. Surface water and vegetation provide a favourable environment for mosquito breeding and are important for modelling [75,76]. These exposures are increasingly incorporated in recent models [53,61]. Inclusion of incidence terms from past weeks is also widely used in public health surveillance,  e.g. the Early Aberration Reporting System, which offers aberration detection methods by analysing recent surveillance data [77]. The time-lag effects of RRV activity are generated not only by climatic factors but also by mosquito abundance, host populations and some geographical elements such as river flow and flooding [9,15,52,56,63,78]. The time lags are also influenced by the species diversity and abundance of mosquitoes in the research area. For instance, the freshwater-breeding Culex annulirostris is affected by rainfall and riverine flooding at freshwater habitats, while the estuarine-breeding Aedes vigilax is associated with estuarine wetlands shaped by tidal flooding and rainfall [2]. Thus, analysing temporal data is helpful to identify the temporal variation in these associations with RRV incidence. Moreover, the host population, mosquito breeding and people's lifestyle vary spatially. Data on the geographical difference and temporal trends of related exposures can be valuable for RRV prediction.
Our systematic analysis showed that linear models and time-series approaches are the two main analytical methods used to predict RRV disease. Linear regressions are simple to manipulate and explain, while time-series models are appropriate for considering autocorrelation and seasonal fluctuations. Both approaches have been widely applied in dealing with infectious diseases [79,80]. Their pros and cons are described in some articles [81][82][83]. Models with good predictive performance perform well at predicting outcomes for out-of-sample data [84]. Usually cross-validation is used to assess model performance in retrospective studies, and 25% of available data for validation is recommended [84]. Some statistics can be derived, such as accuracy, specificity, sensitivity, mean-squared error, mean absolute error or root meansquared error, for evaluation [85]. Head-to-head comparisons of models using common  datasets are suggested for model assessment [86]. Robustness of the models need to be tested under various settings [86]. The best modelling approach for RRV prediction is currently unclear. Therefore, the performances of RRV predictive models are needed in order to compare them and select the best one for a given setting. This is the first systematic review focusing on modelling approaches for predicting RRV disease. This is also the first review that lists statistical methods, significant exposures and the modelling performance of selected studies. Only studies conducted in Australia and published in English were included. We did not search grey literature. We were unable to evaluate publication bias, however two of the included studies were published without significant results. Although we have summarised broad findings in relation to exposures, we have not conducted meta-analysis. Insufficient data were available to assess performance of most models. Therefore, we were not able to strictly compare models and establish an appropriate level of confidence in their performance. The descriptions of predictive models included in this review are based on current publications, so, these data need to be interpreted with caution. The information we have presented is dependent on the limitations of the included studies. A particular issue in RRV research is the non-equivalence between routinely collected surveillance data and RRV incidence. There are also significant limitations for exposure data, brought about by site and number of weather stations, incompleteness of macropod data, variability in mosquito enumeration due to characteristics of particular trap types, and other issues. N sm is the number of models that have significant exposures; N m is the number of models that used the exposures.
Numbers are summarised in categories of mosquitoes and non-human reservoir hosts, which indicates one or more species are applied in each article or each model. Given the complex transmission cycle of the virus, exposures and RRV incidence would not be expected to have a simple linear relationship. Non-linear models such as generalised additive mixed models and machine learning approaches are more likely to provide a more sophisticated representation of the transmission system than linear regression [87][88][89]. Analytical methods that encompass climate, environmental exposures, socio-economic factors and spatio-temporal aspects for forecasting RRV incidence are also worthy of consideration. For example, Bayesian spatio-temporal modelling by Hu (2010) [45] considered the spatial effects, temporal trends, climatic exposures and an interaction term for climate exposures. Regionspecific models are ideal, due to spatial variation in transmission [53]. The complex ecology and the environmental variation in Australia make it challenging to design models with universal applicability that are useful for public health programs. However, there is benefit in assessing the performance of these models, as we have done in this review, to determine usefulness, even if this means rejecting some approaches. Our work will continue with the development of RRV models for Queensland using innovative modelling approaches and then assessing their predictive performance.
Our systematic review provides an analysis of epidemiological models for predicting RRV disease using notification data in Australia. Current modelling approaches are valuable in improving understanding of RRV transmission and in predicting outbreaks. However, model performance assessments are notably lacking. Nonetheless, the summary of significant exposures provided in our systematic review offers suggestions for future modelling. Predictive models are definitely useful tools for understanding transmission and predicting outbreaks of RRV. Better data availability, combined with new modelling approaches and performance assessment may improve the accuracy of forecasting. More detailed information, like daily or weekly data on RRV cases and climatic exposures at a smaller spatial scale will improve model prediction performance [53]. RRV ecology research that provides data on the abundance or spatio-temporal distribution of the mosquitoes and non-human reservoir hosts is beneficial for modelling the transmission cycle and forecasting disease incidence.
Supporting information S1