## Figures

## Abstract

Designing management policies in ecology and agroecology is complex. Several components must be managed together while they strongly interact spatially. Decision choices must be made under uncertainty on the results of the actions and on the system dynamics. Furthermore, the objectives pursued when managing ecological systems or agroecosystems are usually long term objectives, such as biodiversity conservation or sustainable crop production. The framework of Graph-Based Markov Decision Processes (GMDP) is well adapted to the qualitative modeling of such problems of sequential decision under uncertainty. Spatial interactions are easily modeled and integrated control policies (combining several action levers) can be designed through optimization. The provided policies are adaptive, meaning that management actions are decided at each time step (for instance yearly) and the chosen actions depend on the current system state. This framework has already been successfully applied to forest management and invasive species management. However, up to now, no “easy-to-use” implementation of this framework was available. We present GMDPtoolbox, a Matlab toolbox which can be used both for the design of new management policies and for comparing policies by simulation. We provide an illustration of the use of the toolbox on a realistic crop disease management problem: the design of long term management policy of blackleg of canola using an optimal combination of three possible cultural levers. This example shows how GMDPtoolbox can be used as a tool to support expert thinking.

**Citation: **Cros M-J, Aubertot J-N, Peyrard N, Sabbadin R (2017) GMDPtoolbox: A Matlab library for designing spatial management policies. Application to the long-term collective management of an airborne disease. PLoS ONE 12(10):
e0186014.
https://doi.org/10.1371/journal.pone.0186014

**Editor: **David Gent,
US Department of Agriculture, UNITED STATES

**Received: **June 9, 2017; **Accepted: **September 22, 2017; **Published: ** October 5, 2017

**Copyright: ** © 2017 Cros et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All files for model and results are available from Figshare at https://figshare.com/articles/Sustainable_collective_pest_management_at_the_landscape_scale_with_GMDPtoolbox/3759465 (DOI 10.6084/m9.figshare.3759465).

**Funding: **This work has been partly supported by the European Union Seventh Framework Program (FP7/ 2007-2013) under the grant agreement n265865- PURE (https://ec.europa.eu/research/fp7). This study was also funded by the ANR project AGROBIOSE (ANR-13-AGRO-0001) (http://www.agence-nationale-recherche.fr/?Projet=ANR-13-AGRO-0001). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Management problems in ecology and agroecology are complex because several components must be managed together while spatial interactions occur among them. In addition, management actions are applied at a local level while the objective is often defined at a larger level. For instance, for optimizing biodiversity conservation, protection actions may target only a few species or habitats, while the whole biodiversity is of interest. The choice of the target species/habitats depends strongly on the ecological interaction network between the species and habitats [1]. Ecosystem services are usually expected at the regional level, while management actions are applied at the field level [2, 3]. In agroecology, many processes occur at levels higher than the field level because interactions take place among landscape components (commercial fields and interstitial spaces) through biotic and abiotic flows. For instance, erosion problems must be managed collectively at the catchment basin level [4]. In addition, spatial dispersion of pests and beneficials create spatial dependencies between fields and other habitats [5]. A given proportion of refuge fields must be maintained at the landscape level in order to limit adaptation of insects to genetically modified Bt crops [6]. Lastly, management of long-term pesticide durability must be applied at the landscape level for fungicides [7], insecticides [8], and herbicides [9].

Another feature of these management problems is that the objective is a long-term one: biodiversity and production must be preserved in a sustainable way and decisions are not taken once and for all. Instead, sequences of decisions must be taken without a precise and deterministic knowledge of the potentially delayed effects of the decisions on the system [10].

Markov Decision Processes (MDPs [11, 12]) form a suitable framework for modeling and solving problems of sequential decision under uncertainty. A MDP is defined in terms of state variables, action variables, transition probability functions and reward functions. Solving a MDP amounts to finding the policy that optimizes the expected sum of future rewards, over a given time horizon. There exist several freely available toolboxes for solving Markov Decision Processes [13–16]. However, their direct application to domains like ecology or agroecology is difficult when there are a large number of state variables together with a large number of action variables.

Several approaches have been proposed for solving MDPs with multidimensional state and action spaces (FA-FMDPs [17–19]). In general, such methods do not compute an optimal global policy for a given objective, but only an approximate one. A global policy is a set of decision rules that prescribe the actions to apply in any particular entity (e.g. a field, a species) depending on the current state of all the considered entities. In practice, computing and even representing global solution policies for FA-FMDP may quickly become too difficult when the number of state and action variables increases. In addition, it is not always realistic to assume that complete knowledge of the values of all state variables is available when deciding the value of a local management action variable. Therefore, most approaches for solving large FA-FMDPs have tried to overcome this problem by computing approximate policies which are local, in the sense that the decision rule prescribes the action to apply locally, based only on the current states of the few entities in direct interaction.

One such approach is the Graph-based MDP framework (GMDP [20, 21]) and the associate solution algorithms. In a GMDP, each entity is represented as a node of a graph. To each node is associated a pair of state / action variables. The graph edges represent local dependencies in the transition and reward functions. For a fixed policy, the dynamics model is a Dynamic Bayesian Network [22]. Its graphical representation provides an easy interpretation of the local dependencies in the GMDP model. Algorithms dedicated to GMDPs usually find “good” local policies [20, 21], but without any optimality guarantee.

The GMDP framework has already been used to model management problems and to derive policies in various fields: plant disease management [23], human disease management [24], forest management [25], and invasive species control [26].

In this article, we present GMDPtoolbox, a Matlab toolbox which is useful for modeling spatial management problems, for designing and analyzing policies and for comparing given policies by simulation. It provides implementations of the *Approximate Linear Programming* and the *Mean-Field Approximate Policy Iteration* algorithms proposed in [20]. We first briefly describe the GMDP framework, as well as the two above-mentioned algorithms. Then, we describe the functionalities of the toolbox: GMDP solution functions, policy analysis tools and documentation. Finally we provide an illustration of the use of the toolbox, on a realistic crop disease management problem: the design of optimal long term management policies (or strategies) of blackleg on canola (UK: Phoma stem canker on oilseed rape), through the use of three management levers: specific genetic resistance, tillage and cultural control. We show how spatial interactions can be modeled and how collective (at the scale of an agricultural area) and integrated (combining several action levers) control policies can be proposed to support expert thinking.

## The GMDP framework

As a tutorial example, we describe the GMDP framework with the particular interpretation of entities as sites of a spatial area. A site can be a crop field, a forest stand, etc. However, interactions in a GDMP are not limited to the modeling of spatial interactions. They can be, for instance, trophic or ecological interactions.

### Definitions

A discrete-time GMDP is defined by a 5-tuple of variables <*S*, *A*, *N*, *p*, *r*> (see Table 1 for a list of variables definition) where:

*S*is the state space,*S*=*S*_{1}× … ×*S*_{n}with*S*_{i}the finite state space of site*i*.*A*is the action space,*A*=*A*_{1}× … ×*A*_{n}with*A*_{i}the finite action space of site*i*.*N*is the set of sites neighbors set,*N*= {*N*_{i}, ∀*i*= 1, …,*n*} where*N*_{i}⊆ {1, …,*n*} is the set of neighbors of site*i*. Note that it is possible that*i*∈*N*_{i}, but this is not mandatory.*p*is the set of local sites transition probability functions,

, where is the (stationary) probability for site*i*of transitioning to at time*t*+ 1 given that at time*t*the neighborhood of the site is in state*s*_{Ni}= {*s*_{j},*j*∈*N*_{i}} and action*a*_{i}is performed.

The global transition probability is factored according to the local transition probabilities: if and*a*= (*a*_{1}…*a*_{n}) are global state and action vectors,*r*is the set of local sites reward functions*r*= {*r*_{i}(*s*_{Ni},*a*_{i}), ∀*i*= 1, …,*n*, ∀*s*_{Ni}, ∀*a*_{i}}

with*r*_{i}the reward obtained from site*i*at time*t*when the neighborhood of site*i*is in state*s*_{Ni}and action*a*_{i}is performed.

The global reward is the sum of the local ones:

In a usual MDP [11], a function *δ*: *S* → *A* assigning an action to each state is called a *stationary decision rule* or *policy*. Once a policy *δ* is fixed, the MDP defines a stationary Markov Chain over *S*, with transitions *p*_{δ}(*s*′|*s*) = *p*(*s*′|*s*, *δ*(*s*)). The infinite horizon discounted value *v*_{δ}(*s*) of a policy *δ*, applied to a MDP with initial state *s*, is defined as:
The expectation is taken over all possible trajectories 〈*s*^{0}, *δ*(*s*^{0}), *s*^{1}, …, *s*^{t}, *δ*(*s*^{t}), … 〉 starting from the initial state *s*^{0} and applying policy *δ*. The discount factor, 0 ≤ *γ* < 1, ensures that the above infinite sum converges. It also takes into account the fact that there is a difference between the “future value” of a reward and the “present value” of the same reward. The problem of finding the optimal policy for a stationary MDP, or solving the MDP, can be written as:

It has been shown that there always exists an optimal policy [11], and that it can be computed in time polynomial in the size of *S* and *A*, using *Stochastic Dynamic Programming* algorithms such as *Policy Iteration* and *Value Iteration*, or *Linear Programming* algorithms [11].

Since a GMDP is a particular case of MDP, it can be solved using MDP solution algorithms. However, the complexity of these algorithms, which is polynomial in |*S*| and |*A*|, is exponential in *n*. Thus, they are impractical when *n* becomes large. Furthermore, a MDP solution policy, *δ*: *S* → *A* also takes exponential space to represent.

For all these reasons, only approximate solution policies are usually looked for in GMDP problems: the search space is limited to a subset of policies that exploit the notion of neighborhood, namely the set of *local policies*. A policy *δ*: *S* → *A* is said to be *local* if and only if *δ* = (*δ*_{1}, …, *δ*_{n}) where *δ*_{i}: *S*_{Ni} → *A*_{i} (instead of *δ*_{i}: *S* → *A*_{i}). It means that the choice of the action applied on site *i* depends only on the state of its neighbor sites (instead of the state of all sites).

### Two algorithms for approximate resolution of GMDP

The two algorithms implemented in GMDPtoolbox provide local policies by approximating the optimal solution of a GMDP [20]. The first one, referred to as MF-API, exploits the structure of the neighborhood relationships of the GMDP and computes a *mean-field approximation* of the value function of a policy. This algorithm belongs to the family of *Approximate Policy Iteration* (API) algorithms [27]. The second one is a specific *Approximate Linear Programming* algorithm derived from the general class of ALP algorithms [28] and adapted to the GMDP framework. Previous experimental comparisons have shown that the two algorithms provide local policies of similar quality, outperforming naive policies such as greedy or random policies. However, the MF-API algorithm provides a higher-quality approximation of the expected value of the returned policy than the ALP algorithm, which is faster. Thus, the two methods can be seen as complementary. We refer the reader to [20] for a full description of these two algorithms and their comparison.

## GMDPtoolbox

This section describes i) how the Matlab GMDPtoolbox can be used to model spatial management problems; ii) how the value of a policy is computed; and finally iii) how to generate spatio-temporal simulations of the system under the application of a given policy for given initial states. These aspects are illustrated on a generic toy problem in the domain of crop protection.

**Description of a simple epidemiological toy model.** For didactic purposes, we consider a simple implementation of a generic epidemiological toy model with GMDPtoolbox. We consider a situation with 3 commercial fields in which 2 different crops can be sown. One of these crops induces an important profit, however, it is susceptible to a pathogen, and when infected, the profit is reduced. A second crop can be sown, instead, which induces a lower profit. This second crop is not susceptible to the pathogen and induces its elimination from the field (this is the main interest of this crop). The problem is then to decide a long-term crop policy (or strategy), at the scale of the landscape (three fields).

Each field can be described by two states: uninfected (coded 1) or infected (coded 2), |*S*_{i}| = 2,∀*i* = 1, 2, 3. Crop management decisions are taken with a yearly time step and only two actions can be applied to each field: either the high-profit susceptible crop is sown (coded 1) or the low-profit resistant one (coded 2), |*A*_{i}| = 2,∀*i* = 1, 2, 3. The problem is to identify the policy that maximizes the expected cumulative profit on a long-term basis. The topology of the considered area can be represented by a graph (see Fig 1A). In this graph, each node represents a commercial field. A directed edge between two nodes represents potential contamination flows. The neighborhood relationships are here symmetric: *N*_{1} = {1, 2}, *N*_{2} = {1, 2, 3}, *N*_{3} = {2, 3}. The GMDP representation of transitions, policies and rewards structures is displayed in Fig 1B.

Blue (respectively red, green) nodes represent state variables (respectively action variables, reward functions). Blue (respectively red, green) arrows model the influence on states variables (respectively actions variables, rewards functions).

Transition probabilities are defined from the following probabilities:

- the probability
*p*_{ϵ}of long-distance contamination of fields containing susceptible crops, - the probability
*p*_{c}that a field containing a susceptible crop be contaminated from an infected neighboring field.

The probability that a non-infected field at time step *t* with *m*_{i} infected neighboring fields moves to state infected at time *t* + 1 is then defined by:
Note that a field can be non-infected either because the non-susceptible crop was used or because the susceptible one was, but did not get infected by the pathogen. In addition, if a field was infected at time *t* and is still sown with the susceptible crop (*a*_{i} = 1), then the field remains infected at time *t* + 1 with probability 1. If a non-susceptible crop is used (*a*_{i} = 2) at time *t* then the field becomes uninfected with probability 1 at *t* + 1. Profit for each field is impacted by the nature of the chosen crop (*a*_{i} = 1, 2), and its state (infected or uninfected). The minimum profit (noted *r*_{0}) is obtained when the non-susceptible crop is used (*a*_{i} = 2), while the maximum profit (*r*_{m} + *r*_{0}) is obtained when the susceptible crop is sown and the field is not infected. An intermediate profit (*r*_{m}/2 + *r*_{0}) is obtained when the susceptible crop is used while the field is infected. For a field *i* in state *s*_{i} at time *t*, the rewards obtained when action *a*_{i} is performed are given in Table 2. In order to simplify the analysis of optimal policies, we arbitrarily fixed *r*_{0} = 0 in the Toolbox example implementation.

**Describing and analyzing a policy.** On this toy example the MF-API and the ALP solution algorithms lead to the same policy (experiments were run with a discount factor equal to 0.95). This policy can be difficult to interpret since it corresponds to a set of local functions *δ*_{i} from *S*_{Ni} to *A*_{i}. In GMDPtoolbox, one of the proposed visualizations enables to show the proportion of each action applied for each site, depending on the site state (see Fig 2). From these graphics we can see that in this very simple example the policy amounts to the following simple rules, that depend only on the site status and not on the status of the neighboring fields: if field *i* is uninfected (site state 1) then use the high-profit susceptible crop (action 1); if field *i* is infected (site state 2) then use the low-profit resistant one (action 2).

**Simulating the effect of a policy.** In GMDPtoolbox, the evolution of the cumulative global value of the GMDP policy (*i.e.* the truncated infinite horizon discounted value) can be obtained by Monte Carlo approximation, using simulations (Fig 3A). The GMDPtoolbox also provides a graphical representation of the instantaneous global value, *i.e.* the expectation of the (discounted) sum of rewards at a given time step, over all sites (Fig 3B).

Other functions enable quantification of the contribution of each site to the cumulative global value, and the time spent by each site in the different possible states.

**General information.** GMDPtoolbox relies on the free toolbox graphViz4Matlab for three functions that display graphs. One of the solution functions relies on the Matlab Optimization toolbox. Furthermore the two solution functions can be accelerated by using the Matlab Parallel Computing toolbox. A complete description of GMDPtoolbox is available at http://www.inra.fr/mia/T/GMDPtoolbox and the source code is available from Matlab Central (https://fr.mathworks.com/matlabcentral/fileexchange/49101-graph-based-markov-decision-processes-gmdp-toolbox) or the project Forge (https://mulcyber.toulouse.inra.fr/projects/gmdptoolbox).

## Application of the GMDP framework to the long-term collective management of an airborne disease at the landscape level

### Description of the problem

There is a need to limit the structural dependency of European agriculture on pesticides while, at the same time, to maintain satisfactory levels of production and income for farmers. The use of resistant cultivars is the cornerstone of Agroecological Protection against plant pathogens. However, these resistances can be overcome within a few years [29]. There is therefore a need for tools that help design collective management policies which do not rely only on resistant cultivar and mobilize several management levers instead.

In order to illustrate the interest of the GMDP framework and GMDPtoolbox in this context, we consider a simplified management problem that focuses on the long-term collective management of an important disease worldwide: blackleg on canola, caused by the *Leptosphaeria maculans / biglobosa* complex species [30]. Epidemics of blackleg on canola are initiated by infected stubble, remaining on the soil surface after harvest of canola, and that produces ascospores after a period of maturation. These spores are wind-dispersed and can produce leaf spots on seedlings and young canola plants in proper conditions of infection [30]. Once the fungus has infected a leaf, it systematically colonizes the plant and produces a canker, located at the basal stem and the crown, that develops after winter. Control of blackleg on canola mainly relies on the use of cultivar with specific and/or quantitative resistances and cultural controls. Whenever possible, soil tillage should be adopted to reduce the quantity of available primary inoculum [31]. Because of spore dispersal, trying to contain the disease at the field level only is not sufficient. Collective policies at a regional level should be more efficient and more sustainable.

We designed a qualitative model that represents the impact of cropping practices on epidemics of blackleg on canola and the changes over time of a *Leptosphaeria maculans* population. A three-year rotation is assumed for the entire region: fields are successively sown with canola, then wheat, then barley. This is a typical rotation in France. Primary inoculum is produced in wheat fields from infected stubble left on the soil surface after the harvest of canola. Then spores reach neighbor canola fields by dispersion. The genetic structure of the pathogen population is described in terms of proportion of virulent pathotypes to the considered specific resistance. Then the qualitative model of the pathogen spatio-temporal dynamics corresponds to a downgrading of the SIPPOM-WOSR model [32]. This model describes the effects of cropping systems and their spatial arrangement at the landscape level, along with the effects of weather on the genetic structure of *L. maculans* populations, epidemics, and yield losses on canola.

We considered 3 management levers (*action variables* in the GMDP framework): cultivar choice (2 choices: with of without a specific resistance), canola management plan (2 choices: favorable or unfavorable to blackleg; these cultural modes differ in terms of soil nitrogen content, sowing date and sowing density [32, 33]) and tillage (2 choices: plowing or not after the harvest of canola). Each canola field can thus be managed with 2^{3} = 8 possible options. These actions are applied to canola fields on a yearly basis. The wheat and barley fields are assumed to be managed to provide the same annual harvest over years. Using a simple damage function, yield losses were estimated and the economic performances of cropping practices were calculated as a function of economic drivers (*i.e.* crop management cost and canola prices).

The GMDPtoolbox is used to test the effect of 3 contrasted policies corresponding to 3 different attitudes to control blackleg. These policies are compared to the policy computed by the ALP algorithm (referred to as the GMDP policy). All these policies adapt the action choice on each field, on a yearly basis, as a function of the neighboring field states or indicators calculated at the regional scale. The policies are evaluated with regard to the cumulative global gross margin of farmers in the considered region, on a long term basis.

### The GMDP model

The variables of the model are listed in Table 3.

**State of field** *i*. The state *s*_{i} of field *i* is either canola (c), barley (b) or represented by a pair (*I*_{i}, *V*_{i}) if the field is sown with wheat (which will contain canola stubbles from previous year), with

*I*_{i}∈ {1, 2, 3} is the level of inoculum on infected stubble in field*i*, to be interpreted as low, medium or high.*V*_{i}∈ {1, 2, 3} is the level of percentage of virulent spores on infected stubble in field*i*, to be interpreted as low, medium or high.

The correspondence between global states numbering and the values taken by *I*_{i} and *V*_{i} variables is given in Table 4.

**Action in field** *i*. For fields in wheat or barley, no specific action is applied. For a field *i* in canola, the action *a*_{i} is a triple (*CC*_{i}, *CM*_{i}, *W*_{i}) corresponding to

*CC*_{i}(cultivar choice) equal to resistant (encoded by 1) or susceptible (encoded by 2).*CM*_{i}(crop management). Two crop management plans are considered: practice 1 (cautious) enables to decrease the risk of infection (with an early sowing date, a low soil nitrogen content and a low sowing density) while practice 2 (standard crop management) has a higher infection risk.*W*_{i}(plowing) equal to 1 if tillage operations before canola sowing include plowing and 2 otherwise.

When the considered field is a wheat field, 9 states are possible (3 possible levels of primary inoculum production x 3 proportions of virulent spores against a given specific resistance). Thus, in total, each state variable *s*_{i} has 11 possible values. When a field state variable is in state “canola”, there are 8 possible action variable values (2 possible tillage operations times 2 possible cultivar choices times 2 crop managements), while there is only one available action value when the field is in the 10 other possible states. The correspondence between actions numbering and the values taken by *CC*_{i}, *CM*_{i} and *W*_{i} variables is given in Table 5.

**Neighborhood relations, N.** The fields are modeled as the cells of a regular grid. We define the neighbors of field

*i*as the four closest fields (north, south, east, west). This results from the assumption that the landscape experiences only four wind directions (north-south, south-north, east-west, and west-east).

**Transition probability function, p.** The succession of events that occur in a field during a cropping season is represented graphically on Fig 4. When field

*i*in year

*t*is a wheat field, it will be a barley field the next year, while when field

*i*at year

*t*is a barley field, it will be a canola field the next year (both transition are deterministic). When field

*i*at year

*t*is a canola field, it will be a wheat field the next year, and the transition to pairs is stochastic and depends on the state of the neighboring fields of

*i*which are in wheat at year

*t*, and on action . In this case, becomes where

*NW*

_{i}is the set of indices of the neighboring fields of

*i*sown with wheat when

*i*is a canola field. The complete definition of the transition probability function is given in the Supporting Information S1 Appendix.

**Rewards.** The yearly reward at the field level is defined as the gross margin: the difference between the income from crop production selling, and the production costs. Its expression depends on the cultivated crop (the parameters of the reward function are described in Table 6): If at time *t* field *i* is a wheat field
If at time *t* field *i* is a barley field
If at time *t* field *i* is a canola field
with

We considered 500 m x 500 m square fields. This choice is justified by the fact that canola fields further than 500 m away from primary inoculum sources are considered safe with regards to major *Lepstophaeria maculans* infections [34]. With this size of fields, only neighboring fields with infected stubble at soil surface can infect a given canola field. Income from wheat and barley fields (2008-2012 French average yield times 2008-2012 average selling price), as well as canola selling price, were taken from a FAO database (http://faostat3.fao.org). Production costs for wheat and barley fields were estimated by adding operating costs and mechanization labor costs [35]. Attainable canola yields with different crop management plans and cultivar susceptibilities were hypothesized. The three relative yield losses were calculated using the damage function proposed by [32] assuming a Disease Index of 1, 3, and 7 for low, medium and high inoculum levels respectively. Costs of canola management plans were estimated by associating cropping operations [36] with their costs [35] and adding them up (see Table 6).

### The three contrasted policies and the GMDP policy

We consider 3 contrasted policies that correspond to very distinct crop management policies for canola. The first one, the *cultural control* policy, never uses the resistant cultivar and always applies the canola management plan which is the least favorable to blackleg together with plowing: *CC* = 2, *CM* = 1, *W* = 1. On the contrary, the *systematic* policy relies on a permanent use of the resistant cultivar, without plowing and with standard canola management plan: *CC* = 1, *CM* = 2, *W* = 2. Finally, the *integrated* policy is adaptive: if a canola field has no neighbor fields in wheat then the chosen cultivar is sensitive, associated to a standard canola management plan and simplified tillage (action *CC* = 2, *CM* = 2, *W* = 2). Otherwise, if either the maximal level of inoculum or the maximal percentage of virulent spores among the neighbor fields in wheat is in the highest state (*i.e.* 3), cautious decisions are applied: use of resistant cultivar associated to cautious canola management plan and plowing after harvest of canola (*CC* = 1, *CM* = 1, *W* = 1). In all other situations, the sensitive cultivar is used, in association with the canola management plan least favorable to blackleg together with plowing (action *CC* = 2, *CM* = 1, *W* = 1).

It is not straightforward to interpret the optimized GMDP policy from its expression as a function, since the sum over fields of the possible states of the neighborhood is equal to 980,000. We observed that the advocated action (when the field is sown with canola) is to choose sensitive cultivar, standard canola management plan and simplified tillage (action *CC* = 2, *CM* = 2, *W* = 2) for 88% of the cases and to add plowing (action *CC* = 2, *CM* = 2, *W* = 1) for the other 12%. To understand what are the characteristics of the neighbor states that lead to one choice or the other, a CART (Classification And Regression Tree) model was used (Matlab function fitctree). We obtained that the GMDP policy was very well summarized by the following rule (precision of the CART model was of 97%).

**IF** (i) the average of the level of inoculum is low () and the average of the level of virulent spores is not high () or (ii) the average of the level of inoculum is low () and the average of virulent spores is high () and less than 4 neighbor fields are in wheat or (iii) the average of the level of inoculum is medium () and the average of virulent spores is medium () and only one neighbor field is in wheat

**THEN** cultivar is sensitive, associated to a standard canola management plan and simplified tillage (action *CC* = 2, *CM* = 2, *W* = 2)

**ELSE** add plowing (action *CC* = 2, *CM* = 2, *W* = 1).

It can be noted that the long-term optimized GMDP policy does not make use of the cultivar resistance but relies on plowing.

### Comparison of policies’ long term efficiency

We considered a regular grid of 10 by 10 fields. The first year, the land use is as follows: the top left field is in canola, and from left to right and top to bottom we repeat the same pattern, canola then wheat then barley. Then crop rotation applies yearly as described above. In this configuration, a canola field always has 2 wheat neighbors and 2 barley neighbors. The first year, all wheat fields have a low inoculum production level and a low percentage of virulent spores. For each of the 4 policies tested, we simulated 100 trajectories of length 100 of the GMDP model. Average proportions of use of each action and average proportions of state of wheat fields are plotted on Fig 5.

A bar indicates the proportion of times modality 1 of a given action is applied in a canola field. Modality 1 for actions CC, CM and W corresponds respectively to the choice of the resistant cultivar, the application of a canola management plan unfavorable to blackleg, and plowing.

Code and results are available from FigShare (https://dx.doi.org/10.6084/m9.figshare.3759465.v1).

We observed that even though the *integrated* policy prescribed the use of resistant cultivar for certain states of the neighborhood, these states were never reached in the simulations and the *integrated* policy was able to maintain low levels of both inoculum and virulent spores proportion. So in practice, the *integrated*, *cultural control* and GMDP policies succeeded in avoiding the development of blackleg without using the resistant cultivar while, as expected, under the *systematic* policy, the resistance was broken down and the inoculum level reached state 3 (see Fig 6). With the GMDP policy, the state medium level of inoculum and low percentage of virulent spores (*I* = 2, *V* = 1) was sometimes reached, which is not the case with the *integrated* and the *cultural control* policies. But the mean value of reward per field and per year of the GMDP policy (2503 €) was 10% larger than that of the *integrated* (2259 €) and *cultural control* (2256 €) policies and 44% larger than that of the *systematic* policy (1735 €). These results depend on the underlying assumptions of our model (in particular, that the dynamics of the disease are perfectly described by the GMDP transition model). Also, it is assumed that: i) there is a perfect knowledge of the status of each single field; ii) there is a regional coordinated decision-making; iii) fields are evenly sized and evenly spaced apart; iv) weather conditions are always conducive to sporulation; v) there is no cost to access information. These assumptions are obviously strong, but they allow to simply model the behaviour of epidemics at a regional scale. Further modeling developments could be made ni order to address these shortcomings. They also depend on the parameters values used for the simulations (described in Table 6). An analysis of the sensitivity of the GMDP policy and the contrasted ones to the model parameters values could point out situations where the use of the resistant cultivar is useful.

Axis I indicates the level of inoculum production in the field (1 = low, 2 = medium, 3 = high), and axis V indicates the percentage of virulent spores in the inoculum (1 = low, 2 = medium, 3 = high).

## Conclusion

In this article, we have presented the first toolbox dedicated to the Graph-Based Markov Decision Process (GMDP) framework. GMDPtoolbox provides a Matlab structure to encode GMDP problems, as well as modeling tools, solution algorithms, and analysis tools for evaluating and comparing policies (arbitrary policies or obtained by the provided GMDP solution algorithms). In addition, GMDPtoolbox provides a didactic toy example and an illustrative example describing a problem of sustainable collective management of plant disease at the landscape scale. This toolbox completes the set of available toolboxes for solving factored MDP. SPUDD [37] and APRICODD [38] are JAVA softwares implementing respectively exact and approximate solution approaches to FMDP, based on Algebraic Decision Diagrams. The aGrUM C++ library [39] also implements solution algorithms for factored MDP. A limit of these toolboxes is that they use frameworks and algorithms that can handle only a flat representation of the action space, while in GMDPtoolbox, the action space is multidimensional. In the blackleg of canola problem, it would mean to choose a global action among 8^{100} instead of choosing 100 actions each among 8 possible ones with the GMDP model. This is clearly not tractable. The F^{3}MDP Matlab solver [19] removes this limitation, and can model any FA-MDP not just GMDP. However, the computational time for resolution is in general longer than with GMDPtoolbox.

GMDPtoolbox can be useful for two different types of users. Researchers in Artificial Intelligence can test newly developed algorithms for sequential decision in spatial context and compare them with the algorithms provided in the toolbox, on the included examples (using, for example, the toolbox analysis functions). They can also use the toolbox for teaching purposes.

Modelers can use it to support experts thinking, in many applied fields (including ecology and agriculture). The process of modeling an applied management problem in the GMDP framework and building transition and reward functions is already useful by itself to better understand the considered problems. This is something that we observed with specialists of forestry management [25], plant disease control [23] or ecology [26]. Then, the analysis of the solutions (policies) obtained by applying GMDP solution algorithms gives a further insight on the studied problem and sometimes generates new ways of managing spatio-temporal processes, or confirm the quality of the proposed expert policies.

Several extensions of GMDPtoolbox can be explored. The first one is to include the GMDP solution algorithm proposed by [21]. It is based on approximate Value Iteration and approximates the value function using a *Belief Propagation* algorithm. Another natural extension would be to develop an R version of the toolbox. The R language, initially developed for statistical analysis, is a GNU language which is now widely used by modelers, to perform computation and data analysis. An R version of GMDPtoolbox would allow to target a larger community of modeler scientists.

To conclude, we have illustrated the usefulness of GMDPtoolbox for modeling and solving problems of management in agroecology. The scope of applications goes beyond this field since the interactions do not need to be spatial interactions. The framework is also adapted for example to networks of social relationships like in viral marketing applications [21] or to computer networks like in computer virus control.

## Supporting information

### S1 Appendix. Transition probability functions for the GMDP model of management of blackleg of canola.

https://doi.org/10.1371/journal.pone.0186014.s001

(PDF)

## References

- 1. Wilson KA, McBride MF, Bode M, Possingham H. Prioritizing global conservation efforts. Nature. 2006;440:337–340. pmid:16541073
- 2. Lescourret F, Magda D, Richard G, Adam-Blondon AF, Bardy M, Baudry J, et al. A socialâ??ecological approach to managing multiple agro-ecosystem services. Current Opinion in Environmental Sustainability. 2015;14:68–75.
- 3. Duru M, Therond O, Martin G, Martin-Clouaire R, Magne M, Justes E, et al. How to implement biodiversity-based agriculture to enhance ecosystem services: a review. Agronomy for Sustainable Development. 2015;35(4):1259–1281.
- 4. Souchère V, Millair L, Echeverria J, Bousquet F, Le Page C, Etienne M. Co-constructing with stakeholders a role-playing game to initiate collective management of erosive runoff risks at the watershed scale. Environmental Modelling & Software. 2010;25(11):1359–1370.
- 5. Polasky S, Nelson E, Camm J, Csuti B, Fackler P, Lonsdorf E, et al. Where to put things? Spatial land management to sustain biodiversity and economic returns. Biological Conservation. 2008;6(141):1505–1524.
- 6. Jin L, Zhang H, Lu Y, Yang Y, Wu K, Tabashnik B, et al. Large-scale test of the natural refuge policy for delaying insect resistance to transgenic Bt crops. Nature Biotechnology. 2015;33(2):169–174. pmid:25503384
- 7. Rossall S. Fungicide Resistance in Crop Protection: Risk and Management. Plant Pathology. 2012;61(4):820–820.
- 8. Sridhar V, Lokeshwari D, Latha K, Chakravarthy A. Insecticide resistance management: reflections and way forward. Current science. 2014;107(10):1640–1642.
- 9. Coble H, Schroeder J. Call to Action on Herbicide Resistance Management. Weed Science. 2016;64(1):661–666.
- 10. Meir E, Andelman S, Possingham H. Does conservation planning matter in a dynamic and uncertain world. Ecology Letters. 2004;(7):615–622.
- 11.
Puterman M. Markov Decision Processes. John Whiley and Son; 1994.
- 12.
Sigaud O, Buffet O, editors. Markov Decision Processes in Artificial Intelligence. Wiley; 2010.
- 13.
Murphy K. Markov decision process toolbox for Matlab. Available from: http://www.cs.ubc.ca/~murphyk/Software/MDP/mdp.html.
- 14. Chadès I, Chapron G, Cros MJ, Garcia F, Sabbadin R. MDPtoolbox: a multi—platform toolbox to solve stochastic dynamic programming problems. Ecography. 2014;37(9):916–920. Toolbox available from: http://www.inra.fr/mia/T/MDPtoolbox.
- 15.
Cordwell S. Markov decision process toolbox for Python. Available from: https://github.com/sawcordwell/pymdptoolbox.
- 16.
Fackler P. MDPsolve, software for dynamic optimization. Available from: https://sites.google.com/site/mdpsolve.
- 17.
Guestrin C, Koller D, Parr R. Multiagent Planning with Factored MDPs. Proceedings of Advances in Neural Information Processing Systems (NIPS); 2001. p. 1523–1530.
- 18.
Kim KE, Dean T. Solving Factored MDPs with Large Action Space using Algebraic Decision Diagrams. Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence (PRICAI). 2002. p. 80–89.
- 19.
Radoszycki J, Peyrard N, Sabbadin R. Solving F3MDPs: collaborative multiagent Markov decision processes with factored transitions, rewards and stochastic policies. Proceedings of International Conference on Principles and Practices of Multi Agent systems (PRIMA); 2015. Code available from: https://mulcyber.toulouse.inra.fr/projects/f3mdpsolver/.
- 20. Sabbadin R, Peyrard N, Forsell N. A framework and a mean-field algorithm for the local control of spatial processes. International Journal of Approximate Reasoning. 2012;53(1):66–86.
- 21.
Cheng Q, Liu Q, Chen F, Ihler A. Variational Planning for Graph-Based MDPs. Proceedings of Advances in Neural Information Processing Systems (NIPS); 2013. p. 2976–2984.
- 22.
Murphy K. Dynamic Bayesian Networks: Representation, Inference and Learning. School of Computer Science—PhD thesis, University of California, Berkeley; 2002.
- 23.
Peyrard N, Sabbadin R, Lô-Pelzer E, Aubertot JN. A graph-based Markov decision process framework applied to the optimization of policies for integrated management of diseases. Proceedings of American Phytopathological Society and Society of Nematologist joint meeting; 2007.
- 24.
Choisy M, Peyrard N, Sabbadin R. A probabilistic decision framework to optimize the dynamics of a network evolution: application to the control of childhood diseases. Proceedings of European Conference on Complex Systems (ECCS); 2007.
- 25. Forsell N, Wikström P, Garcia F, Sabbadin R, Blennow K, Eriksson LO. Management of the risk of wind damage in forestry: a graph-based Markov decision process approach. Annals of Operations Research. 2011;190(1):57–74.
- 26.
Nicol S, Chades I, Peyrard N, Sabbadin R. An optimal approach to managing two-species competition stopping the Gambusia fish invasion of Edgbaston Mound springs. Proceedings of the 27th International Congress for Conservation Biology (ICCB); 2015.
- 27.
Bertsekas DP, Tsitsiklis JN. Neuro-Dynamic Programming. Belmont, Massachussetts: Athena Scientific; 1996.
- 28. de Farias DP, Van Roy B. The linear programming approach to approximate dynamic programming. Operations Research. 2003;51(6):850–865.
- 29. Aubertot J, West J, Bousset-Vaslin L, Salam M, Barbetti M, Diggle A. Improved resistance management for durable disease control: A case study of phoma stem canker of oilseed rape (Brassica napus). European Journal of Plant Pathology. 2006;114(1):91–106.
- 30.
West J, Kharbanda P, Barbetti M, Fitt B. Epidemiology and management of
*Leptosphaeria maculans*(phoma stem canker) on oilseed rape in Australia, Canada and Europe. Plant Pathology. 2001;50(1):10–27. - 31. Schneider O, Roger-Estrade J, Aubertot J, Dore T. Effect of seeders and tillage equipment on vertical distribution of oilseed rape stubble. Soil & Tillage Research. 2006;85(1-2):115–122.
- 32. Lo-Pelzer E, Bousset L, Jeuffroy M, Salam M, Pinochet X, Boillot M, et al. SIPPOM-WOSR: A Simulator for Integrated Pathogen POpulation Management of phoma stem canker on Winter OilSeed Rape I. Description of the model. Field Crops Research. 2010;118(1):73–81.
- 33.
Aubertot J, Pinochet X, Doré T. The effects of sowing date and nitrogen availability during vegetative stages on
*Leptosphaeria maculans*development on winter oilseed rape. Crop Protection. 2004;23(7):635–645. - 34.
Marcroft S, Sprague S, Pymer S, Salisbury P, Howlett B. Crop isolation, not extended rotation length, reduces blackleg (
*Leptosphaeria maculans*) severity of canola (Brassica napus) in south-eastern Australia. Australian Journal of Experimental Agriculture. 2004;44:601–606. - 35.
Attoumani-Ronceux A, Piskiewicz N, Guichard L. Notice du calculateur STEPHY 1.4. 2012. French
- 36.
Puech T, Schott C, Mignolet C. Actualisation de la base de données Agricole Régionalisée sur le bassin SEIne-NormandiE pour l’analyse récente des pratiques agricoles (ARSEINE: 2006-2014). Quelle Agriculture pour demain” Rapport de fin de phase 6 du PIREN Seine. Juin 2015. French.
- 37.
Hoey J, St-Aubin R, Hu A, Boutilier C. SPUDD: Stochastic Planning using Decision Diagrams. Proceedings of the International Conference of Uncertainty in Artificial Intelligence (UAI); 1999. Toolbox available from: https://cs.uwaterloo.ca/∼jhoey/research/spudd.
- 38.
St-Aubin R, Hoey J, Hu A, Boutilier C. APRICODD: Approximate policy construction using decision diagrams. Proceedings of Advances in Neural Information and Processing Systems (NIPS); 2000. p. 1089-1095. Toolbox available from: https://cs.uwaterloo.ca/∼jhoey/research/spudd.
- 39.
aGrUM toolbox: a Graphical Universal Model. Available from: https://forge.lip6.fr/projects/agrum/wiki.