ࡱ > 8 s u r x v u v w x y z {
!
"
#
{
L N P R T V X Z \ ^ ` b d f S bjbjqq 4 e e K 5 T T 8 g \ 2 4 d ' = = = \ 0 {2 }2 }2 }2 }2 }2 }2 5 7 p }2 ! ! ! }2 = = 2 " " " ! ^ = = * " ! {2 " " " = e}hN " " m* 2 0 2 " 8 " 8 " 8 " $! 8! " F! R! S $! $! $! }2 }2 " $! $! $! 2 ! ! ! ! 8 $! $! $! $! $! $! $! $! $! T ] : Protocol S4: Model validation procedures and additional results
Assessing the plausibility of the model output is essential for reliable interpretation of the mapped output. Many different measures of map uncertainty are available. Here, two different aspects of the performance of the predictive model were assessed using a range of validation statistics. This section describes in detail the procedures used to define a validation set, obtain validation data, and compute a series of summary validation statistics and plots, as well as presenting the results of these analyses in full. The procedures follow very closely those used previously for P. falciparum ADDIN EN.CITE ADDIN EN.CITE.DATA [HYPERLINK \l "_ENREF_1" \o "Gething, 2011 #7076"1,HYPERLINK \l "_ENREF_2" \o "Hay, 2009 #4432"2], but for completeness are detailed here in full.
S4.1 Creation of Validation Sets
Validation statistics obtained via prediction of a validation set are representative of model performance only if the validation set itself is a representative sample of the prediction space. Visual examination of the PvPR point data used in this study revealed clear evidence of spatial clustering (Figure 2A, main text). As such, a simple random sample drawn from this set would be similarly clustered and not spatially representative of the predicted PvPR1-99 surface as a whole. To generate a spatially representative validation set, the full set of 9,970 data locations was stratified into the four global modelling sub-regions (see Protocol S2, section S2.6) and a spatially declustered sampling procedure was implemented within each. Thiessen polygons were defined around each data location EMBED Equation.Ribbit within each region. A Thiessen polygon defines the area closest to each data point in Euclidian space relative to surrounding points. Each datum was then assigned a weight EMBED Equation.Ribbit defined as EMBED Equation.Ribbit where EMBED Equation.Ribbit is the area of the Thiessen polygon surrounding the data location, EMBED Equation.Ribbit . A sample of size EMBED Equation.Ribbit was drawn without replacement from the regional set where each datum had a probability of selection proportional to its weight, EMBED Equation.Ribbit . Those surveys located outside the stable limits of transmission were excluded from selection.
Hold-out sets were thus defined for each modelling sub-region of size EMBED Equation.Ribbit = 50, EMBED Equation.Ribbit = 175, EMBED Equation.Ribbit = 267 and EMBED Equation.Ribbit = 530 for the America, Africa+, Central Asia and SE Asia regions, respectively. The model was then re-run in full for each region independently using the corresponding thinned sets of EMBED Equation.Ribbit = 338, EMBED Equation.Ribbit = 1,465, EMBED Equation.Ribbit = 2,398 and EMBED Equation.Ribbit = 4,747 data to predict PvPR at the validation locations. In contrast to the main model run in which predictions were made for an annual mean for 2010, the validation run predicted values for the time corresponding to the mid-point of each validation survey to enable fairer comparisons of the observed and predicted PvPR values. We evaluated here the ability of the model to predict PvPR within the age limits reported by each study, rather than age-adjusted PvPR1-99. This was deemed a more thorough test of the overall predictive fidelity of the model, because generating predictions for non-standardised age-ranges required an additional age-correction step, and better represented the target quantities generated by the model.
S4.2 Procedures for Testing Model Performance
Predictive performance of the model was tested using two different approaches: the ability of the model to (i) predict point-values of PvPR at unsampled locations; and (ii) provide realistic measure of uncertainty for each prediction.
Predicting Point Values of PvPR
The validation procedure generated EMBED Equation.Ribbit = 1,022 point estimates of PvPR, where point estimates were calculated using the mean of each predicted posterior distribution. This set of point estimates EMBED Equation.Ribbit (where the asterisk denotes a prediction) was then compared to the corresponding set of observed PvPR values EMBED Equation.Ribbit at the validation locations. The ability of the model to predict point-values of PvPR at unsampled locations was quantified using three simple summary statistics: the correlation coefficient between the predicted and actual set, the mean prediction error (ME) defined as:
EMBED Equation.Ribbit , (S4.1)
and the mean absolute prediction error (MAE) defined as:
EMBED Equation.Ribbit (S4.2)
The correlation coefficient provides a straightforward measure of linear association between the data and prediction sets, the ME provides a measure of the bias of the predictor (the overall tendency to over or under predict PvPR values), and the MAE provides a measure of the mean accuracy of individual predictions (the average magnitude of difference between each actual and predicted value). ME and MAE values were presented as both absolute values and as a proportion of the mean PvPR in each region as calculated from the validation set. A scatter plot was also generated as a visualisation of the correspondence between point estimates of PvPR and the corresponding known values.
A sample semi-variogram was calculated from standardised model residuals to assess the presence of residual spatial autocorrelation unexplained by the model output. Standardised Pearson ADDIN EN.CITE McCullagh19894527[3]452745276McCullagh, P.Nelder, J. A.Generalized Linear ModelsMonographs on Statistics and Applied Probability 375402nd1989Boca Raton, FloridaChapman and Hall / CRC Press[HYPERLINK \l "_ENREF_3" \o "McCullagh, 1989 #4527"3] residuals EMBED Equation.Ribbit were defined for each validation location as:
EMBED Equation.Ribbit (S4.3)
where EMBED Equation.Ribbit is the number of individuals surveyed in survey EMBED Equation.Ribbit , EMBED Equation.Ribbit is the age-standardised number of P. vivax positive responses in that survey and EMBED Equation.Ribbit is the corresponding point-prediction of PvPR. This standardisation follows established procedures ADDIN EN.CITE ADDIN EN.CITE.DATA [HYPERLINK \l "_ENREF_4" \o "Clements, 2006 #3445"4,HYPERLINK \l "_ENREF_5" \o "Diggle, 2002 #2293"5] and rescales the raw model residuals to account for their variance characteristics as proportion values. Following the procedure outlined by Diggle and Ribeiro ADDIN EN.CITE Diggle20074430[6]443044306Diggle, P. J.Ribeiro, P. J.Bickel, P.Diggle, P.Fienberg, S.Gather, U.Olkin, I.Zeger, S.Model-based geostatisticsSpringer series in statistics2282007New YorkSpringer[HYPERLINK \l "_ENREF_6" \o "Diggle, 2007 #4430"6], this sample semi-variogram was compared to a Monte Carlo envelope computed from 99 random permutations of the same residual set. This envelope represents the range of semi-variograms that could be expected by chance in the absence of any spatial structure. Where the semi-variogram of interest lies entirely within this envelope, it can be considered to display no significant spatial structure.
Providing Realistic Measures of Uncertainty for Each Prediction
Posterior distributions arising from Bayesian models provide an estimate of the relative probability of a particular outcome and can be used to characterize uncertainty of prediction ADDIN EN.CITE Congdon20034528[7]452845286Congdon, P.Applied Bayesian ModellingWiley Series in Probablity and Statistics4572003ChichesterJohn Wiley and Sons Ltd[HYPERLINK \l "_ENREF_7" \o "Congdon, 2003 #4528"7]. Our model generated a posterior distribution for each unsampled location and a procedure ADDIN EN.CITE ADDIN EN.CITE.DATA [HYPERLINK \l "_ENREF_8" \o "Gething, 2008 #4258"8-10] was implemented to test how well the validation set of 2,386 posterior distributions captured the true uncertainty in our model output. A widely used summary measure extracted from predicted posterior distributions is the credible interval (CI), which defines a range of candidate values associated with a specified predicted probability of occurrence. The 95% CIs, for example, are commonly reported around parameter estimates and define the range of possible values for that parameter that has a 0.95 probability of containing the true value. Credible intervals can be extracted from a posterior distribution for any specified level of probability, and can be tested in a validation procedure against the actual proportion of true values falling within different intervals. In a perfect model, for example, 95% of true values should fall within the 95% CI predicted at each location, 50% within the 50% CI, and so on. In this study, we implemented ADDIN EN.CITE Gething20084258[8,10]4258425817Gething, P. W.Noor, A. M.Gikandi, P. W.Hay, S. I.Nixon, M. S.Snow, R. W.Atkinson, P. M.Developing geostatistical space-time models to predict outpatient treatment burdens from incomplete national dataGeographical AnalysisGeogr AnalGeographical AnalysisGeogr. Anal.Geogr. Anal.Geogr. Anal.167-188402008Moyeed200244444444444417Moyeed, R. A.Papritz, A.An empirical comparison of kriging methods for nonlinear spatial point predictionMathematical GeologyMath GeolMathematical GeologyMath. Geol.Math. Geol.Math. Geol.Math Geol365-386342002[HYPERLINK \l "_ENREF_8" \o "Gething, 2008 #4258"8,HYPERLINK \l "_ENREF_10" \o "Moyeed, 2002 #4444"10] a procedure using this rationale to test the extent to which predicted posterior distributions at each location provided a suitable measure of uncertainty. Working through 100 progressively narrower CIs, from the 99% CI to the 1% CI, each was tested by computing the actual proportion of held-out prevalence observations that fell within the predicted CI. Plotting these actual proportions against each predicted CI level allowed the overall fidelity of the posterior probability distributions predicted at the held-out data locations to be assessed.
S4.3 Validation results
Examination of the mean error in the generation of the P. vivax malaria endemicity point-estimate surface (Figure S4.1) revealed minimal overall bias in predicted PvPR with a global mean error of -0.46 (Americas -1.38, Africa+ -0.03, Central Asia -0.43, South East Asia -0.43), with values in units of PvPR on a percentage scale (Table S4.1). The global value thus represents an overall tendency to underestimate prevalence by just under half of one percent. The mean absolute error, which measures the average magnitude of prediction errors, was 2.48 (Americas 5.05, Africa+ 0.53, Central Asia 1.52, South East Asia 3.37), again in units of PvPR (Table S4.1). These values give an indication of the consequences of taking values from the point-estimate map as operational estimates of endemicity in each pixel. The estimate is nearly unbiased (i.e. mean errors are small), but variance between predicted and observed endemicities (i.e. mean absolute errors) can be substantial due to the short-range heterogeneity observed in the data and the patchy distribution of the dataset. We have provided elsewhere ADDIN EN.CITE Patil20116825[11]6825682517Patil, A.P.Gething, P. W.Piel, F. B.Hay, S. I.Bayesian geostatistics in health cartography: the perspective of malariaTrends in ParasitologyTrends ParasitolTrends in ParasitologyTrends Parasitol.Trends ParasitolTrends in ParasitologyTrends Parasitol.Trends Parasitol245 - 2522762011[HYPERLINK \l "_ENREF_11" \o "Patil, 2011 #6825"11] a more in-depth discussion on approaches to utilising the point-estimate map and associated posterior distribution estimates in downstream quantitative analyses. A semi-variogram of model residuals (Figure S4.1B), defined as Pearson residuals divided by sample size, showed minimal spatial structure. This indicates that the model explains nearly all spatially autocorrelated variation in the observed data, with the unexplained variation being unstructured noise.
The probability-probability plot comparing predicted quantiles with observed coverage fractions (Figure S4.1C) shows the fraction of the observations that were actually contained within each predicted credible interval. This plot illustrates a high degree of fidelity in the predicted quantiles or, put simply, that the predictive distribution is a good representation of the uncertainty in our predictions.
References
ADDIN EN.REFLIST 1. Gething P, Patil A, Smith D, Guerra C, Elyazar I, et al. (2011) A new world malaria map: Plasmodium falciparum endemicity in 2010. Malaria J 10: 378.
2. Hay SI, Guerra CA, Gething PW, Patil AP, Tatem AJ, et al. (2009) A world malaria map: Plasmodium falciparum endemicity in 2007. PLoS Med 6: e1000048.
3. McCullagh P, Nelder JA (1989) Generalized Linear Models. Boca Raton, Florida: Chapman and Hall / CRC Press. 540 p.
4. Clements ACA, Moyeed R, Brooker S (2006) Bayesian geostatistical prediction of the intensity of infection with Schistosoma mansoni in East Africa. Parasitology 133: 711-719.
5. Diggle P, Moyeed R, Rowlingson B, Thomson M (2002) Childhood malaria in The Gambia: a case-study in model-based geostatistics. J Roy Stat Soc C-App 51: 493-506.
6. Diggle PJ, Ribeiro PJ (2007) Model-based geostatistics; Bickel P, Diggle P, Fienberg S, Gather U, Olkin I et al., editors. New York: Springer. 228 p.
7. Congdon P (2003) Applied Bayesian Modelling. Chichester: John Wiley and Sons Ltd. 457 p.
8. Gething PW, Noor AM, Gikandi PW, Hay SI, Nixon MS, et al. (2008) Developing geostatistical space-time models to predict outpatient treatment burdens from incomplete national data. Geogr Anal 40: 167-188.
9. Goovaerts P (2001) Geostatistical modelling of uncertainty in soil science. Geoderma 103: 3-26.
10. Moyeed RA, Papritz A (2002) An empirical comparison of kriging methods for nonlinear spatial point prediction. Math Geol 34: 365-386.
11. Patil AP, Gething PW, Piel FB, Hay SI (2011) Bayesian geostatistics in health cartography: the perspective of malaria. Trends Parasitol 27: 245 - 252.
Table S4.1. Summary of the validation statistics for predicting continuous PvPR by region. The mean of each predicted posterior distribution was used as the point estimate of PvPR for comparison with observed values. See text for a full explanation on the derivation of these statistics and interpretation of results. C Asia = Central Asia modelling region; SE Asia = South East Asia modelling region.
Validation MeasureAmericasAfrica+C AsiaSE AsiaWorldMean error-1.380.03-0.43-0.43-0.41Mean absolute error5.050.531.523.372.48Correlation0.370.670.490.580.56
Figure S4.1. Model Validation Plots. (A) Scatter plot of actual versus predicted point-values of PvPR1-99. (B) Sample semi-variogram of standardised model Pearson residuals estimated at discrete lags (circles) and compared to a Monte Carlo envelope (dashed lines) representing the range of values expected by chance in the absence of spatial autocorrelation. (C) Probability-probability plot comparing predicted credible intervals with the actual percentage of true values lying in those intervals. In plots A, and D the 1:1 line is also shown (dashed line) for reference. See text for full explanation of validation procedures and interpretation of results.
PAGE \* MERGEFORMAT 1
( - . 8 9 ? @ A 屗}}}cN0 :hF h f B*CJ OJ QJ ^J aJ fH ph q
)h>yr h f B*CJ OJ QJ ^J aJ ph 3h) h f 5B*OJ QJ ^J aJ mH nH ph u3hz hN
c 5B*OJ QJ ^J aJ mH nH ph u3hz h4 5B*OJ QJ ^J aJ mH nH ph u3hz h f 5B*OJ QJ ^J aJ mH nH ph u3hz h;C 5B*OJ QJ ^J aJ mH nH ph u3hz hZ( 5B*OJ QJ ^J aJ mH nH ph u @ A \ ] ~ S @ A o p [ \ |
$dh `a$gd
$dh a$gd
; $hdh -D M
`ha$gdF $dh -D M
`a$gdF $dh `a$gd0
$dh a$gdJ$ L
ʯv[?! :h\ h\ B*CJ OJ QJ ^J aJ fH ph q
7h\ 6B*CJ OJ QJ ^J aJ fH ph q
4h\ B*CJ OJ QJ ^J aJ fH ph q
4hZz B*CJ OJ QJ ^J aJ fH ph q
:hF h f B*CJ OJ QJ ^J aJ fH ph q
4h1= B*CJ OJ QJ ^J aJ fH ph q
4h8 B*CJ OJ QJ ^J aJ fH ph q
4hy B*CJ OJ QJ ^J aJ fH ph q
ݿݿuRJFJ# Eh\ h B*CJ OJ QJ ^J aJ fH mH nH ph q
uh7I j h7I UEh\ h\ B*CJ OJ QJ ^J aJ fH mH nH ph q
uIjp h\ h7I B*CJ OJ QJ U^J aJ fH ph q
Ij h\ h7I B*CJ OJ QJ U^J aJ fH ph q
:h\ h\ B*CJ OJ QJ ^J aJ fH ph q
Cj h\ h7I B*CJ OJ QJ U^J aJ fH ph q
$ % & ' ( ) Z [ \ ] ԋpR=( )h>yr h f B*CJ OJ QJ ^J aJ ph )h\ h f B*CJ OJ QJ ^J aJ ph :h\ hZz B*CJ OJ QJ ^J aJ fH ph q
4h\ B*CJ OJ QJ ^J aJ fH ph q
Cj h\ h7I B*CJ OJ QJ U^J aJ fH ph q
Eh\ h B*CJ OJ QJ ^J aJ fH mH nH ph q
uh7I Eh\ h\ B*CJ OJ QJ ^J aJ fH mH nH ph q
uj h7I U
] _ n o y z } ~ Y Z [ ٲo[F1 )h hZz B*CJ OJ QJ ^J aJ ph )h h f B*CJ OJ QJ ^J aJ ph &h1= 6B*CJ OJ QJ ^J aJ ph ,h>yr h f 6B*CJ OJ QJ ^J aJ ph ,hj` hj` 6B*CJ OJ QJ ^J aJ ph )h>yr h f B*CJ OJ QJ ^J aJ ph (h>yr h f 5B*OJ QJ ^J aJ ph "hO 5B*OJ QJ ^J aJ ph (h8 h f 5B*OJ QJ ^J aJ ph "hz 5B*OJ QJ ^J aJ ph E
F
G
I
M
ìÄoaoO= #h8 B*CJ OJ QJ ^J aJ ph #h|ew B*CJ OJ QJ ^J aJ ph h1= CJ OJ QJ ^J aJ )h>yr hB B*CJ OJ QJ ^J aJ ph &h1= B*CJ H*OJ QJ ^J aJ ph &h1= 6B*CJ OJ QJ ^J aJ ph ,h>yr h f 6B*CJ OJ QJ ^J aJ ph )h>yr h f B*CJ OJ QJ ^J aJ ph #h| B*CJ OJ QJ ^J aJ ph )h h@N B*CJ OJ QJ ^J aJ ph
ƴqZ; <joS
h>yr h f B*CJ EHOJ QJ UV^J aJ ph -h>yr h f B*CJ EHOJ QJ ^J aJ ph 6j h>yr h f B*CJ EHOJ QJ U^J aJ ph )h h8 B*CJ OJ QJ ^J aJ ph #h: B*CJ OJ QJ ^J aJ ph #h|ew B*CJ OJ QJ ^J aJ ph #h8 B*CJ OJ QJ ^J aJ ph )h>yr h f B*CJ OJ QJ ^J aJ ph #hC B*CJ OJ QJ ^J aJ ph e f } ~ ȳȜ}aȳE. -h>yr h f B*CJ EHOJ QJ ^J aJ ph 6j h>yr h f B*CJ EHOJ QJ U^J aJ ph 6j+ h>yr h f B*CJ EHOJ QJ U^J aJ ph <j
oS
h>yr h f B*CJ EHOJ QJ UV^J aJ ph -h>yr h f B*CJ EHOJ QJ ^J aJ ph )h>yr h f B*CJ OJ QJ ^J aJ ph 6j h>yr h f B*CJ EHOJ QJ U^J aJ ph 6j h>yr h f B*CJ EHOJ QJ U^J aJ ph
% ũxaB&xxa