Assessing the Impact of U.S. Food Assistance Delivery Policies on Child Mortality in Northern Kenya

The U.S. is the main country in the world that delivers its food assistance primarily via transoceanic shipments of commodity-based in-kind food. This approach is costlier and less timely than cash-based assistance, which includes cash transfers, food vouchers, and local and regional procurement, where food is bought in or nearby the recipient country. The U.S.’s approach is exacerbated by a requirement that half of its transoceanic food shipments need to be sent on U.S.-flag vessels. We estimate the effect of these U.S. food assistance distribution policies on child mortality in northern Kenya by formulating and optimizing a supply chain model. In our model, monthly orders of transoceanic shipments and cash-based interventions are chosen to minimize child mortality subject to an annual budget constraint and to policy constraints on the allowable proportions of cash-based interventions and non-US-flag shipments. By varying the restrictiveness of these policy constraints, we assess the impact of possible changes in U.S. food aid policies on child mortality. The model includes an existing regression model that uses household survey data and geospatial data to forecast the mean mid-upper-arm circumference Z scores among children in a community, and allows food assistance to increase Z scores, and Z scores to influence mortality rates. We find that cash-based interventions are a much more powerful policy lever than the U.S.-flag vessel requirement: switching to cash-based interventions reduces child mortality from 4.4% to 3.7% (a 16.2% relative reduction) in our model, whereas eliminating the U.S.-flag vessel restriction without increasing the use of cash-based interventions generates a relative reduction in child mortality of only 1.1%. The great majority of the gains achieved by cash-based interventions are due to their reduced cost, not their reduced delivery lead times; i.e., the reduction of shipping expenses allows for more food to be delivered, which reduces child mortality.

works better in unexceptional years, which are far more common; hence, we use only 52 months of data. Furthermore, some observations were missing in each month, and imputing them decreases the quality of predictions, as measured by the root mean square error (RMSE) compared to the forecasts based on an unbalanced panel with missing observations. We removed 14 communities from our data set because of high levels of missing data (our threshold was 25% of observations in a period). The community-level panel is very unbalanced ( Table 1 in [1]) due to poor data collection, entry and storage protocols of the Arid Land's Resource Management Project team of the Government of Kenya. Administrative problems do not appear to have been related to characteristics of the communities, but rather were due to internal management issues. The 2236 children in the remaining 42 communities (Table A) represent 76% of the total population of children in the 56 communities, and we hereafter restrict our analysis to this subset of communities.

A.2 The Effect of Food Assistance
As noted in the main text, because we will be using the MUAC-Z forecasts to optimize the food deliveries, we modify our training sample of MUAC-Z values so that the forecasts are calibrated to forecast the mean MUAC-Z net of the effect of the food assistance that was actually provided. To get this net MUAC-Z value, for each observation we find the amount of food assistance delivered to the community in each of the last 54 months and then subtract the effect of each portion of this food assistance from the observed realization of MUAC-Z.
To perform these calculations, we need to address two issues: missing data and the arrival and departures of children in our sample.
If we do not observe the food assistance delivered in some month within our sample dates, we infer it using a linear model with dummy variables for the month and community (we tried a multiplicative model instead of a linear model, but the linear model produced a better fit). The amount of food assistance that was delivered to a community in a month before the earliest date of our sample is approximated by the mean observed amount of food assistance delivered to that community during the dates covered by our sample.
Turning to the second issue, even though we assume that the effect of food assistance is persistent, 60-month old children age out of our sample and are replaced by 6-month old children, who have not yet received any food aid. These child replacements cause the effect of food assistance in a given month on the mean MUAC-Z of the children in our sample to gradually decrease over the following months. In particular, every month this effect decreases by 1/54 th of its original value because children of ages 5 − 59 months are included in the sample and every month the 59-month-old children (who constitute 1/54 th of the sample because we assume that children are born uniformly over time) age out of our sample and are replaced by 6-month-olds. Therefore, the effect of food assistance on the sample mean τ periods after it was distributed is 54−τ 54 of the immediate effect of food assistance. Finally, our data set provides the amount of food assistance in kilograms, but the food trials (e.g., [3]) measure food in calories. We convert kilograms into kcals by using the data from [2], which states that in 2012, one ton of average food assistance in Kenya satisfied annual caloric requirements for 4.8 people. The daily caloric requirement is 2100 kcal per person, and so 1 kg of food assistance contains on average 4.8(2100)(365)/1000 = 3679 kcal.
As consistency checks, we note that wheat contains 3400 kcal/kg, and Unimix contains 3800 kcal/kg. Recalling from the main text that 500 kcal/day over 90 days increases MUAC-Z by 0.19 [3], we find that 1 kg of food aid increases MUAC-Z by 3679 500 0.19 90 = 0.015 on average, which we denote by d.

A.3 Regression Model
Our dependent variable is Y i,t+τ , which is the mean MUAC-Z in the absence of food assistance in community i = 1, . . . , 42 in month t + τ , as estimated in month t = 1, . . . , 52. For a fixed i and t, we have 66 explanatory variables denoted by X jit for j = 1, . . . , 66, which includes lagged values of the MUAC moments, herd variables, food aid and biophysical variables; see Table 2 for a description of these variables, which is taken from Table 2 in [1]. For a fixed forecast lead time τ , the regression equation is given by (our notation differs slightly from the equation on page 331 of [1]) where β 0 is a constant, u i is the community-specific effect for i = 1, . . . , 42, (φ 0 , . . . , φ 11 ) captures seasonality related to the month of the year, and v it is the unobserved error term, which has mean zero and variance σ 2 . We perform a different regression for each value of the forecast lead time τ , but we suppress this dependence on τ in the parameters (β 0 , β jit , u i , φ t , σ 2 ) in (1) for ease of presentation.
Because there are 119 explanatory variables in our regression, we regularize the estimation procedure by including a ridge penalty [4] with a coefficient chosen by five-fold cross-validation (i.e., the original sample is randomly split into five equal-sized parts, and we repeat the following procedure five times: use four parts as the training data to estimate the forecasting model and use the fifth part as the test data to make forecasts and compute the RMSE). The resulting RMSE was indistinguishable from the RMSE of the model without regularization.
As explained in [1], the estimation procedure might suffer from an endogeneity problem due to correlation between food assistance shipments and unobserved factors. But our main objective is to obtain the best possible predictions (in the RMSE sense), and we sacrifice the causal interpretation of regression coefficients to achieve this; consequently, we will not be interpreting the estimated coefficients.
The forecast accuracy is evaluated by comparing the forecast to an actual realization of mean MUAC-Z values in the absence of food assistance. Five time lags of biophysical variables are included among the predictors, which requires five consecutive observations in the same community to make a forecast for this community. Because the missing data are serially correlated, this requirement does not drastically reduce the amount of available data: ≈ 50% of the observations in our data set can be retro-forecasted to assess the forecast accuracy.
In a real forecasting setting, in each time period forecasts are typically made based on all the data available at that time. To mimic this state of affairs, we re-estimate our forecasting model each month using the data observed up to that time.
Applying our forecasting methodology to the forecast lead times τ = 1, . . . , 12 (Fig A) shows that the RMSE increases until nine months, suggesting that most of the useful information is revealed at most nine months in advance of the forecast target date. Consequently, we set the forecast horizon equal to nine months in §B.
While our approach generates forecasts at the community level, our ultimate goal is the optimization of food assistance shipments into the whole northern Kenya region.
Consequently, in each month we aggregate the forecasts for individual communities by using a weighted average of the community forecasts, where the weight for community i, w i , is the proportion of the population in the 42 communities that is from community i ( Table 1). That t,t+τ is the forecast for community i made in month t for month t + τ , the aggregated forecast for the entire northern Kenya region is given by If we denote the standard deviation of the forecast f (i) t,t+τ by σ (i) t,t+τ , then the variance of the aggregated forecast in (2) is given by Unless otherwise specified, only forecasts aggregated across communities will be discussed.

B Forecast Evolution Model
The forecast evolution model is formulated in §B.1, the long-term mean MUAC-Z in the absence of food assistance and the standard deviation of the population-wide MUAC-Z in the absence of food assistance are estimated in §B.2, and the covariance matrices of the forecast evolution model are estimated in §B.3.

B.1 The Model
Recall that f t−τ,t is the forecast made in month t − τ of the mean MUAC-Z in month t in the absence of food assistance. The evolution of the forecasting model in §A.3 can be described in terms of the forecast updates, which is the amount that the forecast for month t is adjusted (either upward or downward) as we move from month t − τ − 1 to month t − τ . Put another way, the forecast for month t made in month t − τ (i.e., f t−τ,t ) equals the forecast for month t made in month t − τ − 1 (i.e., f t−τ −1,t ) plus the forecast update (i.e., t−τ,t ). We are only considering forecast updates made up to nine periods ahead (Fig A), and assume that any forecast for month t made more than nine months ahead isμ tmod12 , which is the long-term mean MUAC-Z in the absence of food assistance in the month corresponding to time period t. Hence, the first forecast for the mean MUAC-Z in month t using the regression model is made in month t − H and is f t−H,t =μ tmod12 + t−H,t . In the following month, we change the forecast to f t−H+1,t = f t−H,t + t−H+1,t =μ tmod12 + t−H,t + t−H+1,t , and so on. Using equation (5) in an iterative manner, we find that the realized mean MUAC-Z in month t in the absence of food assistance, denoted by y t (which also equals f t,t ), can be expressed as Thus far, the forecast evolution model in (5)-(8) is identical to the Martingale Model of Forecast Evolution (MMFE) [5], aside from the fact that we incorporate seasonality into the long-term forecast,μ tmod12 . The MMFE model assumes that the vector ( t,t , t,t+1 , . . . , t,t+H ) of forecast updates made in month t form an independent and identically distributed (iid) sequence of normal random vectors. As noted in the main text, we generalize the MMFE model by incorporating two kinds of covariance among forecast updates: covariance among forecast updates for the same target month, given by Cov( t−τ 1 ,t , t−τ 2 ,t ) = Σ 1 τ 1 ,τ 2 , and covariance among updates made in the same month, given by Cov( t,t+τ 1 , t,t+τ 2 ) = Σ 2 τ 1 ,τ 2 . All other pairs of updates are assumed to be uncorrelated: Cov( t 1 ,t 2 , t 3 ,t 4 ) = 0 unless t 1 = t 2 or t 3 = t 4 . In our model, forecast updates are assumed to have zero mean and have a multivariate normal distribution with the covariance structure described above. Note that the diagonals of the 10 × 10 covariance matrices Σ 1 and Σ 2 are the same by construction.
Also, in the classical MMFE model, the matrix Σ 1 is diagonal -and hence the covariance is characterized by Σ 2 -because there are no correlations among forecast updates made in different months.
The mean squared error of a forecast with lead time k is

B.2 Estimating the Long-term Mean and the Standard Deviation
We estimate the long-term MUAC-Z mean for each month of the year in the absence of food assistance (μ tmod12 ) by taking a population-weighted average of MUAC-Z means in each month and averaging the result across years. The mean MUAC-Z ranges from a low of -2.43 in October to a high of -2.28 in February (Fig B). we observe that changes in the mean of MUAC-Z are significantly larger than changes in its standard deviation. The standard deviation of MUAC-Z is estimated by taking a sample analogue of (4). Because these standard deviations have very little variation across months, we assume that the standard deviation is independent of the month of the year. Averaging across months yields σ = 0.62.

B.3 Estimating the Covariance Matrices
We use the first two years of data to estimate the parameters in the regression model in (1), and then use the remainder of the data to generate forecasts up to nine months ahead.
Forecasts are generated for individual communities and then aggregated into forecasts for the entire region using (2). Sequential forecasts are turned into forecast updates by taking first differences according to equation (5), and the covariance matrices Σ 1 , Σ 2 are estimated from these forecast updates.
The sample covariances used to estimate Σ 1 and Σ 2 are regularized to ensure stationarity of the forecast update process and to account for the small sample size (we have only 33 observations to estimate 55 covariance parameters in each matrix). More specifically, regularization is performed by sequentially applying factor analysis and shrinkage to the sample covariance matrices Σ 1 and Σ 2 , where factor analysis is used to guarantee stationarity of the forecast updates process and shrinkage is used to compensate for the small sample size.
Factor analysis attempts to address the possibility that the variation of observed forecast updates is driven by a small number of underlying unobserved random variables called factors. This method [6] approximates a covariance matrix Σ by the sum of a diagonal matrix Ψ and a low-rank matrix LL T , where L ∈ R p×k , p = 10 is the number of rows or columns in Σ, and k < p is the rank of LL T ; the value of k is specified later in this subsection. Factor analysis regularizes the estimate by reducing the number of parameters to be estimated from to p + pk.
Shrinkage works by estimating Σ by a weighted average of the sample covarianceŜ, which is a no-bias, high-variance estimator, and the identity matrix I, which represents a high-bias, minimum-variance estimator, where Tr(Ŝ) is the trace of the matrixŜ, and ρ is the shrinkage coefficient, which can range from 0 to 1. We use the Oracle-Approximating Shrinkage (OAS) estimator [7], which minimizes the expected matrix norm of an error within the class of shrinkage estimators (which includes the sample covariance as a trivial case with zero shrinkage), yielding the optimal shrinkage coefficient where n is the sample size. As expected for a small-sample correction, ρ → 0 in (15) and Σ →Ŝ in (14) as n → ∞.
We cannot directly apply (15) because we do not know the true covariance matrix Σ.
Instead, we use an iterative procedure, which switches between the estimation of Σ and the optimal ρ for 20 iterations (during which convergence was achieved), as follows: 1. Set i = 0 and initialize the estimate of the covariance matrix with the sample covariance matrix:Σ 0 =Ŝ.
2. Estimate of optimal shrinkage coefficient using the current estimate of the covariance 3. Update the covariance matrix estimate using the current estimate of the optimal shrinkage coefficient: 4. If i < 20, go to step 2; otherwise, increment i by one and continue to step 5.
5. Stop and returnΣ 21 as the covariance matrix estimate, ρ 21 as the optimal shrinkage coefficient.
To specify the rank k of the matrix LL T in the factor analysis, we find for each k the minimum value of the shrinkage coefficient ρ that makes the forecast update process stationary. Fig C shows the boundary in the two-dimensional (ρ, k) space between forecast update stationarity and nonstationarity. We set k = 3, which reduces the number of parameters to be estimated from 55 to 40, while decreasing the minimum necessary shrinkage from 0.295 (the value in the absence of factor analysis) to 0.231. This reduction in ρ is desirable because too much shrinkage (i.e., a high value of ρ) can bias the off-diagonal elements of the estimate towards zero. We set the shrinkage coefficient to ρ = 0.232, which is below the value of 0.246 derived in (15) because part of the regularization is performed through factor analysis, and obtain a stationary forecast update process with a maximum absolute eigenvalue equal to 0.98.
In summary, we sequentially apply factor analysis and shrinkage with rank k = 3 and shrinkage coefficient ρ = 0.232 to both Σ 1 and Σ 2 , which generates sufficient regularization to ensure stationarity of the forecast updates process without strongly biasing the off-diagonal elements towards zero.
We conclude this subsection by describing how we deal with missing data. Nearly 50% of the forecasts for individual communities cannot be generated because the required data are missing. It is necessary to infer the missing forecasts to obtain the aggregated forecasts in (2). If a forecast made in month t for month t + τ and community i is missing, we substitute it with an average of the two forecasts for community i that are: 1. Made for the target month t + τ in the month closest to t, i.e. f i t+s,t+τ , where s ∈ Z has the smallest absolute value for which observation f i t+s,t+τ is not missing (if there is a tie for the closest absolute value, an average of the two forecasts is used); 2. Made in month t for the target month t + τ + s closest to t + τ , i.e. f i t,t+τ +s , where s ∈ Z has the smallest absolute value for which observation f i t,t+τ +s is not missing (if there is a tie for the closest absolute value, an average of the two forecasts is used).
This simple inference method implicitly assumes that forecasts are smooth functions of the months when they are made and targeted for, and that data are missing at random.
We also tested two other approaches to infer the missing data: using the forecast from item 1 above (i.e., using only forecasts made in the same month) and using the forecast from item 2 above (i.e., using only forecasts for the same target month). All three inference methods lead to virtually the same covariance matrix estimates, highlighting the robustness of our approach.

C Optimizing the Food Assistance Decisions
In this section, we choose the monthly food assistance quantities to minimize child mortality subject to an annual budget constraint. Child mortality is specified in terms of MUAC-Z in §C.1. In §C.2, we derive a closed-form solution for the simplified problem that has a single delivery mode and allows monthly orders to be negative. This closed-form solution yields monthly quantities that are affine in all previous orders, all previously observed forecast updates, and the available budget. In §C.3, we return to our original problem, which has two delivery modes and requires monthly orders to be nonnegative. Motivated by the results in §C.2 and the fact that this problem is difficult to solve exactly, we numerically optimize over the restrictive class of policies that are affine in all previous orders, all previously observed forecast updates, and the current budget.

C.1 The Relationship Between MUAC-Z and Child Mortality
The goal of the optimization problem is to minimize the expected number of child deaths.
We assume that the monthly mortality rate for a child with MUAC-Z value z is e a−bz for parameters a and b. In [8], the monthly mortality rate is estimated to depend on weightfor-age Z-score (WAZ) via e −7.13−0.722z , and we extrapolate this dependence and assume that mortality rate depends on MUAC-Z in the same way as on WAZ (see Table 2 in [9] for a justification of this extrapolation), and set a = −7.13, b = 0.722.
We also assume that MUAC-Z for children in any given month has a normal distribution. If the mean and variance of the distribution are µ and σ, the expected number of child deaths in the population of N children in this month is = N e a+ b 2 σ 2 2 −bµ .

C.2 The Exact Solution With a Single Shipment Mode
In this subsection, we consider the simplified problem that has one shipment mode and allows negative food shipments. In our problem formulation, we make the following 11 assumptions: 1. There is a single delivery mode with a lead time of L months; 2. There is a single community with N children; 3. The MUAC-Z among children in the community in month t has a normal distribution with mean µ t and standard deviation σ; 4. By (8), the mean MUAC-Z in month t in the absence of food assistance isμ tmod12 + H τ =0 t−τ,t , whereμ tmod12 is the long-term mean MUAC-Z in month t in the absence of food assistance, t−τ,t is the forecast update for month t made in month t − τ , and H = 9 mo is the forecast horizon. Similarly, the forecast for month t made in month 5. The mean MUAC-Z in month t depends linearly on the food assistance delivered in the last 54 months (see §A.2) via where d = 0.015/kg is the impact of food assistance on the mean MUAC-Z, N is the number of children, and O t is the order placed in month t; 6. The vector of forecast updates, ( t,t , t,t+1 , . . . , t,t+H ) has a multivariate normal distribution with mean zero and the covariance structure described in § B: 7. The standard deviation of MUAC-Z in each month, σ = 0.62, is assumed to be known in advance; 8. Month T = 120 is the last month that we can place an order, and month T + L -when that order is delivered -is the last period that we include in the objective function; 9. Orders are allowed to be negative, which corresponds to the collection of food from the communities to increase the available budget and enable larger food shipments in future months; 10. The objective is to minimize the expected number of child deaths (as given in (8)) during months 1, . . . , T + L; 11. An annual budget B is made available in months t = 1, 13, 25, . . .. Each order decreases the available budget according to where B t is the budget available in month t, c is the cost per kg of food assistance, and we assume that the remaining budget is used in the final month of each fiscal year.
This problem can be formulated and solved as a dynamic program [10]. In month t, the decision is the order quantity O t and the system state of the dynamic program consists of all previous orders (O t−54 , . . . , O t−1 ), all (H +1)(H +2)/2 forecast updates observed up to month t that are for the current or future months ( t−H,t ; t−H+1,t , t−H+1,t+1 ; . . . ; t−1,t , t−1,t+1 , . . . , t−1,t+H−1 ; t,t , t,t+1 , . . . , t,t+H ), and the available budget B t . Because the order quantity O t impacts child mortality only in months t + L, . . . , T + L, the optimal expected cost-to-go in month t, denoted by J t , represents the cost starting from month t + L, and the expectation evaluated in month t, denoted by E t [·], is with respect to the forecast updates made in month t + 1 ( t+1,t+1 , . . . , t+1,t+1+H ). The solution to this dynamic program is given in the following theorem.
Theorem 1. The optimal expected cost-to-go in month t is the exponent of an affine function The optimal order in month t is an affine function Proof. We prove the theorem by backward induction, starting with month T , which is the last month of its year (i.e., T mod 12=0). Hence, the entire available budget is used; i.e., The decision O * T affects child mortality only in month T + L, yielding Equation (26) proves the base of induction and establishes the coefficients for t = T : Next, we suppose that the relationships in (26)-(34) hold in months t + 1, ..., T , and prove that they hold in month t. We let C t be the child mortality rate in month t, and note that the dynamic programming recursion dictates that the optimal order quantity O t minimizes E t [C t+L + J t+1 ] subject to the budget equation in (24), where We define the functions so that equations (35) and (37) can be re-expressed as for a random variable X ∼ N (µ, σ 2 ), we need to find the conditional mean and variance of t+1,t+1+τ 2 in (39), given { } t . These forecast updates have a multivariate normal distribution, and hence the conditional mean is linear in the variables that we condition on: For a given τ , the vector of coefficients p τ τ 1 ,τ 2 in (42) can be expressed via the covariance matrices Σ 1 , Σ 2 in (23) as follows: where Σ τ 12 is a vector of covariances of the random variable t+1,t+1+τ with the vector { } t , and Σ 22 is the covariance matrix of the vector { } t . Denoting the variance of ( H by V (γ (t+1) ), we can express the expectation in (39) as Moreover, using the change of variables t 1 = τ 1 − 1 and t 2 = τ 2 + 1 and letting I {x} denote the indicator function of x, we can write the double summation in (39) as Using (46) and (48), we can re-express (39) as By (40)-(41), the optimization problem is which is convex. If β (t+1) 1 − cδ (t+1) > 0 (we prove later that this is indeed the case), this problem has a unique minimum defined by the first-order condition The solution to (52) is which, after substituting in A t+L and A t+1 from (38) and (49), is an affine function of . . , O t−1 with parameters given by Substituting (53) into (51), we get the optimal cost-to-go starting from period t, Because A t+L and A t+1 are both exponentials of affine functions of { } t , B t , O t−53 , . . . , O t−1 , the cost-to-go function J t is also an exponential of an affine function with the following parameters: We complete the proof by showing that β Assuming (64) and (66) hold for month t + 1, we now show that they hold for month t. We have Similarly, = − 1 55 = − 1 55 = bd 54N . (73)

C.3 Numerical optimization
The model described in §C.2 is based on two unrealistic assumptions: single delivery mode and negative orders. Because we were not able to obtain a closed-form solution without these assumptions and because the original problem (i.e., with two delivery modes and nonnegative orders) is unwieldy, we resort in this subsection to a suboptimal approach to the original problem. More specifically, motivated by Theorem 1, we numerically find the death-minimizing policy among all ordering policies that are affine in the state variables.
To allow for two delivery modes with nonnegative orders, we modify Assumptions 1, 4, 9 and 11 in §C.2: 1 . There are two delivery modes: a fast mode with a lead time of L f months and a slow mode with a lead time of L s months; 4 . The mean MUAC-Z in month t depends linearly on the food assistance delivered in the last 54 months (see §A.2) via where µ 0 through their weighted linear combination 53 τ =1 O m t−τ 54−τ 54 , to depend on forecast updates only through forecasts for future months, to have the coefficients of the available budget and the forecasts be exponential functions of the month of the year and the forecast lead time.
Imposing these restrictions has allowed us to improve the out-of-sample performance of our policies. Adapting the relevant notation from Theorem 1, we search for the best policy from the affine parametric class defined by We evaluate the cost of a given policy, defined by the 12 parameters (ν f , λ f and not orders placed before the beginning of the simulation period (which we assume to be placed by allocating annual budgets equally among 12 months regardless of state variables). The direct application of (77) could result in negative orders or orders larger than the available budgets. In such cases, no order is placed or the entire budget is spent, respectively.
Optimal values of the 12 parameters in (77) are searched for numerically by the Nelder-Mead algorithm [11], which is a robust algorithm that can deal with objective functions that lack smoothness. Because this algorithm finds only a local minimum and our objective function is non-convex, we use multiple random restarts from multiple (in our case, 300) points around the starting point, with each component of the parameter vector randomly changed by up to 20%. We use starting values inspired by Theorem 1, except for η f and η s , which are optimized conditional on the remainder of the initial values: