ࡱ > t v s n o n g J x { b o
bjbjcTcT 8 > > ~ * T T 8 yG t ' ' ' ' 2 2 2 F F F F F F F $ wI L n G 2 2 2 2 2 G ' '
3G 7 7 7 2 J ' ' nB 7 2 F 7 7 6 N= > ' ЎYy_ 7 .> ZB IG 0 yG J> @ L 7 L 8 > L > 2 2 7 2 2 2 2 2 G G 7 2 2 2 yG 2 2 2 2 L 2 2 2 2 2 2 2 2 2 T ] : Protocol S1: Supplemental Methods
S1.1. Statistical Model for P. falciparum Prevalence
Full details of the geostatistical model for P. falciparum prevalence are provided in the supplementary information: Protocol S3 of Hay et al. (2009) ADDIN EN.CITE Hay200944324432443217Hay, S. I.Guerra, C. A.Gething, P. W.Patil, A. P.Tatem, A. J.Noor, A. M.Kabaria, C. W.Manh, B. H.Elyazar, I. R. F.Brooker, S. J.Smith, D. L.Moyeed, R. A.Snow, R. W.PLoS MedicinePLoS MedPLoS MedicinePLoS Med.PLoS MedPLoS MedicinePLoS Med.PLoS Mede1000048632009doi:10.1371/journal.pmed.1000048[1]. Briefly, the EMBED Msxml2.SAXXMLReader.5.0 th observation EMBED Msxml2.SAXXMLReader.5.0 was modelled as binomially distributed, conditional on the sample size EMBED Msxml2.SAXXMLReader.5.0 , the age limits EMBED Msxml2.SAXXMLReader.5.0 and a spatiotemporal random field EMBED Msxml2.SAXXMLReader.5.0 evaluated at the location EMBED Msxml2.SAXXMLReader.5.0 and time EMBED Msxml2.SAXXMLReader.5.0 of the survey. The typical prevalence EMBED Msxml2.SAXXMLReader.5.0 was a composition of a link function, in this case the standard inverse-logit, and a Gaussian process denoted EMBED Msxml2.SAXXMLReader.5.0 . The random function EMBED Msxml2.SAXXMLReader.5.0 mapped EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 to community prevalence within the age limits of survey EMBED Msxml2.SAXXMLReader.5.0 . The distributional parameters of the unknown function EMBED Msxml2.SAXXMLReader.5.0 were a mean function EMBED Msxml2.SAXXMLReader.5.0 and a covariance function EMBED Msxml2.SAXXMLReader.5.0 , where EMBED Msxml2.SAXXMLReader.5.0 denotes scalar parameters, including covariate coefficients ADDIN EN.CITE Abrahamsen19975337533753376Abrahamsen, P.A review of Gaussian random fields and correlation functions64Second1997Blindern, Oslo, NowayNorwegian Computing CentreEnglishWilliams19975338533853386Williams, C. K. I.Prediction with Gaussian process: from linear regression to linear prediction and beyond. Technical Report NCRG/97/012191997Birmingham, U.K.Neural Computing Research Group, Department of Computer Science and Applied Mathematics, Aston UniversityEnglish[2,3]. The model was completed by priors for EMBED Msxml2.SAXXMLReader.5.0 . In probability notation:
MACROBUTTON AuroraSupport.EditInitialCounterValues [Beginning of the document] ADDIN MACROBUTTON AuroraSupport.NoMacro [Automatic section break] SEQ EqChapter \h SEQ EqChapter \h SEQ EqSection \r 0 \h SEQ Eq \r 0 \h ADDIN EMBED Msxml2.SAXXMLReader.5.0 (SA.1)
For clarity, this section will refer to this schematic representation of the model.
A major reason for the success of Gaussian processes in geostatistical modelling is the fact that they are trivial to marginalize. The vector EMBED Msxml2.SAXXMLReader.5.0 obtained by evaluating EMBED Msxml2.SAXXMLReader.5.0 at all the survey locations and times EMBED Msxml2.SAXXMLReader.5.0 has a multivariate normal distribution:
EMBED Msxml2.SAXXMLReader.5.0 (SA.2)
The vector EMBED Msxml2.SAXXMLReader.5.0 is constructed in the same way as EMBED Msxml2.SAXXMLReader.5.0 , and the matrix EMBED Msxml2.SAXXMLReader.5.0 is defined as:
EMBED Msxml2.SAXXMLReader.5.0 (SA.3)
Conditional on EMBED Msxml2.SAXXMLReader.5.0 , EMBED Msxml2.SAXXMLReader.5.0 is independent of EMBED Msxml2.SAXXMLReader.5.0 evaluated at all other locations, so this marginalization can be used to reduce EMBED Msxml2.SAXXMLReader.5.0 to a simple multivariate normal variable at the model-fitting stage.
Sampling the Posterior Predictive Distribution
The model was fitted using the Markov chain Monte Carlo (MCMC) algorithm ADDIN EN.CITE Gilks19995051505150516Gilks, W. R.Spiegelhalter, D. J.Markov Chain Monte Carlo in practice. Interdisciplinary Statistics1999Boca Raton, Florida, U.S.A.Chapman & Hall / CRC Press LLC[4], which produces a sequence of samples from the joint posterior of EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 . Conditional on one of these samples, the value of EMBED Msxml2.SAXXMLReader.5.0 at unobserved location and time EMBED Msxml2.SAXXMLReader.5.0 can be sampled from its posterior predictive distribution:
MACROBUTTON AuroraSupport.NoMacro [Automatic section break] SEQ EqChapter \h SEQ EqSection \r 0 \h SEQ Eq \r 0 \h ADDIN EMBED Msxml2.SAXXMLReader.5.0 (SA.4)
where the posterior predictive parameters are given by the standard conditioning formulas for multivariate normal variables ADDIN EN.CITE West19975335533553356West, M.Harrison, J.Bayesian forecasting and dynamic modelsSpringer Series in StatisticsSecond1997New York, U.S.A.Springer-Verlag New York, Inc.[5]:
EMBED Msxml2.SAXXMLReader.5.0 (SA.5)
These formulas apply regardless of whether EMBED Msxml2.SAXXMLReader.5.0 denotes multiple prediction locations or a single point. Samples from the predictive distribution EMBED Msxml2.SAXXMLReader.5.0 for many values of EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 from the MCMC trace can be regarded as samples from the target predictive distribution EMBED Msxml2.SAXXMLReader.5.0 ADDIN EN.CITE Gilks19995051505150516Gilks, W. R.Spiegelhalter, D. J.Markov Chain Monte Carlo in practice. Interdisciplinary Statistics1999Boca Raton, Florida, U.S.A.Chapman & Hall / CRC Press LLC[4].
The global P. falciparum endemicity maps ADDIN EN.CITE Hay200944324432443217Hay, S. I.Guerra, C. A.Gething, P. W.Patil, A. P.Tatem, A. J.Noor, A. M.Kabaria, C. W.Manh, B. H.Elyazar, I. R. F.Brooker, S. J.Smith, D. L.Moyeed, R. A.Snow, R. W.PLoS MedicinePLoS MedPLoS MedicinePLoS Med.PLoS MedPLoS MedicinePLoS Med.PLoS Mede1000048632009doi:10.1371/journal.pmed.1000048[1] were summaries of the posterior predictive distribution of an annual average at year EMBED Msxml2.SAXXMLReader.5.0 and pixels EMBED Msxml2.SAXXMLReader.5.0 :
EMBED Msxml2.SAXXMLReader.5.0 (SA.6)
where EMBED Msxml2.SAXXMLReader.5.0 denotes the age range EMBED Msxml2.SAXXMLReader.5.0 , which has been found previously to be highly responsive to transmission intensity ADDIN EN.CITE Smith200742084208420817Smith, D. L.Guerra, C. A.Snow, R. W.Hay, S. I.Fogarty International Center, National Institutes of Health, Building 16, 16 Center Drive, Bethesda, Maryland 20892, USA. davesmith@ufl.eduMalaria JournalMalaria JMalaria JournalMalar. J.Malar J1316AdolescentAdultAge FactorsAgedAged, 80 and over*AlgorithmsAnimalsChildChild, PreschoolHumansInfantMalaria, Falciparum/parasitology/*transmissionMiddle Aged*Models, BiologicalParasitemia/parasitology/transmissionPlasmodium falciparum/*growth & development20071475-2875 (Electronic)17894879English[6]. These annual averages were preferred to the more standard point evaluation:
EMBED Msxml2.SAXXMLReader.5.0 (SA.7)
because malaria transmission is known to be seasonal, meaning no particular point in time adequately captures the overall annual transmission pattern.
Statistical model relating P. falciparum prevalence and clinical incidence
The empirical prevalence-incidence model presented recently ADDIN EN.CITE Patil200952355235523517Patil, A.P.Okiro, E. A.Gething, P. W.Guerra, C. A.Sharma, S. K. Snow, R. W.Hay, S. I.Malaria JournalMalaria JournalMalar. J.Malar J18682009[7] modelled the relationship between P. falciparum prevalence and observed clinical incidence as a negative binomial parameterised by the average parasite rate over the survey period in the 2-10 year old age group EMBED Msxml2.SAXXMLReader.5.0 , and the effective length of the survey, EMBED Msxml2.SAXXMLReader.5.0 :
MACROBUTTON AuroraSupport.NoMacro [Automatic section break] SEQ EqChapter \h SEQ EqSection \r 0 \h SEQ Eq \r 0 \h ADDIN EMBED Msxml2.SAXXMLReader.5.0 (SA.8)
The function EMBED Msxml2.SAXXMLReader.5.0 was an increasing parabolic function of average parasite rate, and the mean incidence EMBED Msxml2.SAXXMLReader.5.0 was assigned a Gaussian process prior. The posterior distribution of the random function EMBED Msxml2.SAXXMLReader.5.0 , which gave the overall relationship between average parasite rate and incidence, was used directly in the current work. This posterior distribution is illustrated in Figure A2.
S1.2. Volumetric Prediction and Joint Simulation
To produce maps summarizing the predictive distribution of a point evaluation, such as Eq. (SA.4), at a single point in time, it suffices to follow the procedure outlined above independently for each pixel. That is, the predictive distributions EMBED Msxml2.SAXXMLReader.5.0 , EMBED Msxml2.SAXXMLReader.5.0 , EMBED Msxml2.SAXXMLReader.5.0 at the pixels EMBED Msxml2.SAXXMLReader.5.0 in the output raster grid can be sampled and summarized independently. The computational cost of producing such a map is proportional to the number of pixels in the raster, for large raster grids.
Producing maps summarizing temporal integrals like Eq. (SA.6) is more involved. Due to the nonlinear link function, the integral must be approximated as a discrete sum ADDIN EN.CITE Burden20045336533653366Burden, R. L.Faires, D. J.Numerical analysisSixth2004Pacific Grove, California, U.S.A.Brooks/Cole Publishing Company[8] of evaluations at a vector EMBED Msxml2.SAXXMLReader.5.0 of time points spaced evenly between EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 . The pixels EMBED Msxml2.SAXXMLReader.5.0 can still be considered independently, but it is not sufficient to treat the time points within EMBED Msxml2.SAXXMLReader.5.0 independently in a given pixel. This is easy to understand by considering a limiting case: if EMBED Msxml2.SAXXMLReader.5.0 is very fine, the discrete sum should produce a very good approximation to the integral. However, if EMBED Msxml2.SAXXMLReader.5.0 is treated as independent at each point in EMBED Msxml2.SAXXMLReader.5.0 , the sum amounts to taking the mean of a large number of independent random variables, whose individual standard deviations are inversely proportional to their number. The variance of this mean will be very small: in other words, for very fine EMBED Msxml2.SAXXMLReader.5.0 the sum is nearly determined by EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 . This is clearly incorrect. Predictive samples for the discrete sum must be based on joint predictive samples of EMBED Msxml2.SAXXMLReader.5.0 .
The computational cost of the original P. falciparum endemicity maps ADDIN EN.CITE Hay200944324432443217Hay, S. I.Guerra, C. A.Gething, P. W.Patil, A. P.Tatem, A. J.Noor, A. M.Kabaria, C. W.Manh, B. H.Elyazar, I. R. F.Brooker, S. J.Smith, D. L.Moyeed, R. A.Snow, R. W.PLoS MedicinePLoS MedPLoS MedicinePLoS Med.PLoS MedPLoS MedicinePLoS Med.PLoS Mede1000048632009doi:10.1371/journal.pmed.1000048[1] was proportional to the number of pixels in the raster grid, but the constant of proportionality was larger than it would be for maps based on point evaluations. At every pixel, and for every sample of EMBED Msxml2.SAXXMLReader.5.0 and EMBED Msxml2.SAXXMLReader.5.0 from the MCMC trace, the covariance matrix EMBED Msxml2.SAXXMLReader.5.0 had to be constructed and its Cholesky decomposition had to be computed ADDIN EN.CITE Golub19965239523952396Golub, G. H.van Loan, C. F.Matrix computationsThird1996Baltimore, Maryland, U.S.A.Johns Hopkins University Press[9]. The computational costs of these operations were proportional to the square and the cube of the length of EMBED Msxml2.SAXXMLReader.5.0 , respectively. However, since the length of EMBED Msxml2.SAXXMLReader.5.0 was much smaller than the size of the dataset EMBED Msxml2.SAXXMLReader.5.0 , these P. falciparum endemicity maps ADDIN EN.CITE Hay200944324432443217Hay, S. I.Guerra, C. A.Gething, P. W.Patil, A. P.Tatem, A. J.Noor, A. M.Kabaria, C. W.Manh, B. H.Elyazar, I. R. F.Brooker, S. J.Smith, D. L.Moyeed, R. A.Snow, R. W.PLoS MedicinePLoS MedPLoS MedicinePLoS Med.PLoS MedPLoS MedicinePLoS Med.PLoS Mede1000048632009doi:10.1371/journal.pmed.1000048[1] were not much more expensive to compute than maps based on point evaluations.
The burden estimates presented in this paper are also based on predictive distributions of integrals:
MACROBUTTON AuroraSupport.NoMacro [Automatic section break] SEQ EqChapter \h SEQ EqSection \r 0 \h SEQ Eq \r 0 \h ADDIN EMBED Msxml2.SAXXMLReader.5.0 (A9)
where EMBED Msxml2.SAXXMLReader.5.0 is the administrative unit or other geographical region in which total burden is to be computed, EMBED Msxml2.SAXXMLReader.5.0 is the random function mapping prevalence to clinical incidence and EMBED Equation.Ribbit is the GRUMP population surface ADDIN EN.CITE Balk200635063506350617Balk, D. L.Deichmann, U.Yetman, G.Pozzi, F.Hay, S. I.Nelson, A.Center for International Earth Science Information Network (CIESIN), Columbia University, PO Box 1000, Palisades, NY 10964, USA.Determining global population distribution: methods, applications and dataAdvances in ParasitologyAdv ParasitolAdvances in ParasitologyAdv. Parasitol.Adv ParasitolAdvances in ParasitologyAdv. Parasitol.Adv Parasitol119-1566220060065-308XEnglish[10] at year EMBED Equation.Ribbit . Like Eq. (SA.6), this is a volumetric quantity that can be estimated as a discrete sum.
However, the scale is massively different. The current paper and planned future modelling work by the Malaria Atlas Project (MAP, http:www.map.ox.ac.uk), require aggregate burden estimates and population-weighted average prevalences for any desired spatial region, from individual pixels to continents, and over any time interval between 1985 (corresponding to the oldest points in the MAP database) and the present. A raster grid spanning the malarious regions of Africa at 5 5 km spatial resolution and monthly temporal resolution contains approximately 345 million pixels. An EMBED Msxml2.SAXXMLReader.5.0 covariance evaluation (where EMBED Msxml2.SAXXMLReader.5.0 denotes that the computational cost of this evaluation is proportional to the square of the number of prediction locations) is completely infeasible on this scale, let alone an EMBED Msxml2.SAXXMLReader.5.0 Cholesky decomposition. Block circulant embedding ADDIN EN.CITE Dietrich199753395339533917Dietrich, C. R.Newsam, G. N.Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrixSIAM Journal on Scientific ComputingSIAM J. Sci. Comput.Siam Journal on Scientific Computing1088-11071841997Jul1064-8275ISI:A1997XG74200009English[11] can be used to sample multivariate normal variables in EMBED Msxml2.SAXXMLReader.5.0 operations, but could not be adapted to our situation: reflecting the autocovariance array along three axes requires more memory than was available and, more importantly, due to the curvature of the Earth, the covariance matrix cannot be put in block Toeplitz form ADDIN EN.CITE Dietrich199753395339533917Dietrich, C. R.Newsam, G. N.Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrixSIAM Journal on Scientific ComputingSIAM J. Sci. Comput.Siam Journal on Scientific Computing1088-11071841997Jul1064-8275ISI:A1997XG74200009English[11].
To overcome this computational hurdle, an application-specific algorithm was designed, which is described in full elsewhere ADDIN EN.CITE Gething201053345334533417Gething, P.W.Patil, A.P.Hay, S.I.PLoS Computational BiologyPLoS Computational BiologyPLoS Comput. Biol.PLoS Comput Biole1000724642010[12]. In brief, the algorithm builds up the evaluations by scanning over time and space, taking advantage of the empirical fact that most of the information relevant to a particular scan-line is contained in nearby scan-lines. Although it is much faster than standard methods, the algorithm is expensive enough that we were forced to distribute the computation over a cluster of computers. Using this algorithm, we produced 500 joint simulations over EMBED Msxml2.SAXXMLReader.5.0 , where EMBED Msxml2.SAXXMLReader.5.0 indicates a 5 5 km pixel resolution surface over the P. falciparum malarious regions of the world. The simulations and subsequent reductions to maps and/or predictive distributions each took roughly two days of computer time for each of the three global regions and were distributed over compute instances on the Amazon Elastic Compute Cloud (http://aws.amazon.com/ec2).
References
ADDIN EN.REFLIST 1. Hay SI, Guerra CA, Gething PW, Patil AP, Tatem AJ, et al. (2009) A world malaria map: Plasmodium falciparum endemicity in 2007. PLoS Med 6: e1000048.
2. Abrahamsen P (1997) A review of Gaussian random fields and correlation functions. Blindern, Oslo, Noway: Norwegian Computing Centre. 64 p.
3. Williams CKI (1997) Prediction with Gaussian process: from linear regression to linear prediction and beyond. Technical Report NCRG/97/012. Birmingham, U.K.: Neural Computing Research Group, Department of Computer Science and Applied Mathematics, Aston University. 19 p.
4. Gilks WR, Spiegelhalter DJ (1999) Markov Chain Monte Carlo in practice. Interdisciplinary Statistics. Boca Raton, Florida, U.S.A.: Chapman & Hall / CRC Press LLC.
5. West M, Harrison J (1997) Bayesian forecasting and dynamic models. New York, U.S.A.: Springer-Verlag New York, Inc.
6. Smith DL, Guerra CA, Snow RW, Hay SI (2007) Standardizing estimates of the Plasmodium falciparum parasite rate. Malar J 6: 131.
7. Patil AP, Okiro EA, Gething PW, Guerra CA, Sharma SK, et al. (2009) Defining the relationship between Plasmodium falciparum parasite rate and clinical disease: statistical models for disease burden estimation. Malar J 8: 186.
8. Burden RL, Faires DJ (2004) Numerical analysis. Pacific Grove, California, U.S.A.: Brooks/Cole Publishing Company.
9. Golub GH, van Loan CF (1996) Matrix computations. Baltimore, Maryland, U.S.A.: Johns Hopkins University Press.
10. Balk DL, Deichmann U, Yetman G, Pozzi F, Hay SI, et al. (2006) Determining global population distribution: methods, applications and data. Adv Parasitol 62: 119-156.
11. Dietrich CR, Newsam GN (1997) Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix. SIAM Journal on Scientific Computing 18: 1088-1107.
12. Gething PW, Patil AP, Hay SI (2010) Quantifying aggregated uncertainty in Plasmodium falciparum malaria prevalence and populations at risk via efficient space-time geostatistical joint simulation. PLoS Comput Biol 6: e1000724.
PAGE \* MERGEFORMAT 6
! " # ) ? L X
* ʹm #jO
h, h5 EHUVaJ h, h5 EHaJ j h, h5 EHUaJ hMBJ j h{$ Uh h5 6h5 hG" h5 6hG" h5 h5 h5 56 h5 h5 5 h+ 5hFb h]/ 5 h5 5 ho 5 hLu 5hFb hFb 5hFb h: 5 $ " # X | Q } 5 a 1# # /' [' + c2 2 : ; ; gd5 ` gd5 $a$gd5 gd5 $a$gd5 gd5 gd5 gdFb gdFb * + , - T U u v - . M ϹϹwϹeVϹ ja h, h5 EHUaJ #jۼO
h, h5 EHUVaJ j h, h5 EHUaJ #jڼO
h, h5 EHUVaJ j" h, h5 EHUaJ #jټO
h, h5 EHUVaJ h, h5 EHaJ h5 hG" h5 j h, h5 EHUaJ jO h, h5 EHUaJ #jؼO
h, h5 EHUVaJ M N O P Z [ z { | } 8 9 X Y Z [ q r ϽȘȘϽwϽeVϽ j9 h, h5 EHUaJ #jO
h, h5 EHUVaJ j2 h, h5 EHUaJ #jO
h, h5 EHUVaJ h5 j+ h, h5 EHUaJ #jݼO
h, h5 EHUVaJ h, h5 EHaJ hG" h5 j h, h5 EHUaJ j?$ h, h5 EHUaJ #jܼO
h, h5 EHUVaJ ! @ A B C { | ϽϽ{uϽcTϽ jU h, h5 EHUaJ #jO
h, h5 EHUVaJ
h5 EH j N h, h5 EHUaJ #jO
h, h5 EHUVaJ j"G h, h5 EHUaJ #jO
h, h5 EHUVaJ h, h5 EHaJ hG" h5 j h, h5 EHUaJ j+@ h, h5 EHUaJ #jO
h, h5 EHUVaJ = > ? @ | } ȹȹ{laO@<