Value of sample information in dynamic, structurally uncertain resource systems

Few if any natural resource systems are completely understood and fully observed. Instead, there almost always is uncertainty about the way a system works and its status at any given time, which can limit effective management. A natural approach to uncertainty is to allocate time and effort to the collection of additional data, on the reasonable assumption that more information will facilitate better understanding and lead to better management. But the collection of more data, either through observation or investigation, requires time and effort that often can be put to other conservation activities. An important question is whether the use of limited resources to improve understanding is justified by the resulting potential for improved management. In this paper we address directly a change in value from new information collected through investigation. We frame the value of information in terms of learning through the management process itself, as well as learning through investigations that are external to the management process but add to our base of understanding. We provide a conceptual framework and metrics for this issue, and illustrate them with examples involving Florida scrub-jays (Aphelocoma coerulescens).


Introduction
Few if any natural resource systems are completely understood and fully observed. Instead, an almost universal situation is for there to be uncertainty about the way a system works and its status at any given time, which can limit effective management (Williams and Johnson [1]). A natural approach to uncertainty is to allocate time and effort to the collection of data, on the assumption that more information will facilitate better understanding and lead to better management. But the collection of more data, either through observation or investigation, requires time and effort that could be put to other activities like conservation on the ground. An important question is whether the use of limited resources to improve understanding is justified by the potential to improve management (Doremus [2]). This question is often asked by managers but only infrequently if ever answered satisfactorily, though some authors (see, e.g., -MaAllister and Pikitch [3] and McAllister and Kirkwood [4]), have used expected resource valuations to contrast different monitoring strategies. PLOS  There is by now a well-developed theory and approach for the assessment of the value of information in decision making. Raiffa and Schlaifer [5] provided one of the first seminal treatments for the value of information (VOI), coining the name and developing many of its key expressions. Since then many publications have offered descriptions of the value of information, (e.g., Quirk [6], Dakins et al. [7], Yakota and Thompson [8][9], Canessa et al. [10], Williams and Johnson [11]). Keisler et al. [12] provide a comprehensive review of applications of value of information analyses. Several metrics for the value of information are recognized (Yakota and Thompson [8]): • The expected value of perfect information utilizes an average of optimal model-specific values, averaged over the model likelihoods. The metric consists of this average net of the optimal value in the presence of process uncertainty.
• The expected value of partial information concerns the value added by eliminating uncertainty from a single source, assuming there is more than one source of uncertainty.
• Finally, the expected value of sample information expresses the potential gain in value from the collection of less than perfect information, using a comparison of optimal valuation with additional information against valuation in its absence. A general framework for the value of information that includes perfect, partial and sample information in sequential decision making for natural resources is described by Williams et al. [13].
The expected value of perfect information has been used in a growing number of applications in natural resource management (e.g., Conroy at al. [14], Mäntyniemi et al. [15], Williams et al. [13]), and several applications address the expected value of partial information (e.g., Moore and Runge [16], Johnson et al. [17], Maxwell et al. [18], and Johnson et al. [19]). On the other hand, the number of examples addressing the expected value of sample information is more restricted (e.g., Runge et al. [20], Moore et al. [21], Grantham et al. [22]). Few VOI applications in natural resources deal with dynamic resource systems, in which actions are dependent both on the state of the system and the degree of uncertainty in system dynamics (e.g., Shea et al. [23], Williams and Johnson [11,24], and Moore et al. [21]). Somewhat surprisingly, there are almost no examples for dynamic systems that address the expected value of sample information, even though many resource problems are fundamentally dynamic and a typical monitoring situation involves production of less than perfect information.
In this article we address a change in value from sample information collected during the investigation in dynamic decision making. We frame the value of information in terms of learning through the management process itself, as well as learning through investigations that are external to management but add to our base of understanding. Our objective is to extend valuation to include dynamic decision making with sources of data that are both internal and external to the management process. The framework developed here goes beyond current treatments of the value of sample information in the literature, in its emphasis on management and learning about dynamic natural resources.
In what follows the value of information is described in a context of sequential decision making under uncertainty, with future resource conditions and future understanding potentially influenced by current decisions. We focus specifically on structural uncertainty, that is, uncertainty about the processes that control resource dynamics. Partial observability (Williams [25]), another recognized and important source of uncertainty, can also be addressed by considering additional resources to improve estimates of resource status. However, we emphasize structural uncertainty in this paper, and point the reader to expositions in the literature on valuation under partial observability (Fackler [26], Williams and Johnson [24] and references therein). We provide two examples of the value of sample information based on the management of habitat for the Florida scrub-jay (Aphelocoma coerulescens).

Decisions, returns, and uncertainty
Among other things the value associated with sequential decision making under process or structural uncertainty depends on the amount of that uncertainty. With greater understanding one can make more informed (and higher valued) decisions; with less understanding progress toward achieving resource goals and objectives is limited.
Here we assume a managed natural resource (e.g., a landscape, an amphibian population, a butterfly colony, the number of vegetative organism in an area) that is subject to only partial understanding. Uncertainty about how the resource system works is expressed by means of different hypotheses (models) about the system and its responses to management actions. Each model has a measurable likelihood of being the most appropriate, based on current information and understanding (Williams et al. [27]).
We also assume a range of different management actions (e.g., different seeding mixes, harvest strategies, water control regimes, geographic locations), with time-specific actions influencing the transition of the resource from its current state to a future state, and generating returns that provide a basis for comparing different management actions. Once an action is taken and a transition is made to a new state, another action is taken, and another return is generated at that time. The trajectory of anticipated returns depends on which hypothesis (model) is most appropriate, and therefore inherits the model uncertainty.
The challenge in such a situation is to recognize and measure the change in value resulting from an increase in information and understanding. A broadly accepted measure of change is given by a comparison of optimal valuation produced with additional information, against optimal valuation in its absence (Raiffa and Schlaifer [5]). An understanding of the change in value enables assessment of cost-effectiveness in targeting uncertainty with additional research or monitoring.

Decision making under structural uncertainty
A framework for the expected value of sample information under dynamic decision making applies to resources that are subject to management through time. Both resource status and management interventions are seen as fluctuating through time, with the system state and action at time t influencing system behavior going forward. Here we summarize the components of learning-based management under structural uncertainty. The necessary notation is highlighted in Table (1).
System dynamics. State transitions are described in terms of Markov decision processes (MDP) (Puterman [28], Williams et al. [27]): If x t and a t are the state and action at a particular Table 1. Notation used to characterize dynamic decision making and valuation under structural uncertainty. t Time index for a range of times constituting the time frame. The index is assumed here to take positive integer values, from some time t 0 through time T that may be infinite.
x t System state (e.g., size, density, spatial coverage). Because the system is assumed to change through time its state is time-specific.
k Model index for k = 1,. . .,K models representing different hypotheses about system dynamics. Vector (q t (1),q t (2),. . .,q t (K)) of model-specific probabilities, with q t (k) the probability that model k best represents the system at time t. a t Action taken as a result of decision making. Because they are taken through time, actions are time-indexed.

A t
Policy that specifies a particular action for each system state and model state at each time starting at time t in the time frame.
R(a t , x t ) time t and x t+1 is the state at the next time, then the probability of transition from x t to x t+1 is P (x t+1 | x t ,a t ). Under structural uncertainty the decision process is not completely understood, i.e., the transition probabilities in P(x t+1 | x t ,a t ) are uncertain (Williams [29], Williams and Brown [30]). Different Markovian models P k (x t+1 | x t ,a t ) are used along with model probabilities q t (k) to account for structural uncertainty. The model state q t ¼ ðq t ð1Þ; q t ð2Þ; . . . ; q t ðKÞÞ evolves through time as information accumulates via monitoring, and an average of model-specific transition probabilities based on q t produces model-averaged transition probabilities Decision making. A policy A t of actions over time frame {t,. . .,T} consists of actions Aðx t ; q t Þ for each system and model state at each time t in the time frame. Policy A t can be characterized sequentially by action a t at time t, followed thereafter by the remainder A t+1 of the policy over {t + 1,. . .,T}: Propagating uncertainty. The dynamics of the model state are driven by information produced over time that is either internal or external to management. The source of information for internal updating comes from within the management process itself, in the spirit of adaptive management (Nichols and Williams [31]). Bayes' theorem (Lee [32]) can be used for updating uncertainty, based on system state transitions from x t to x t+1 : Uncertainty also can be updated with information from outside the management process, that is, from experimentation or tracking that is effectively independent of decision making (Williams [33]). In this case resource data z t are acquired through external investigation, with Bayes' theorem again used for updating uncertainty based on model-specific data distributions: Uncertainty updating with both sources of information factors into the expected value of sample information with dynamic decision making.
Valuation. Strategy valuation for this problem is based on the accrual of returns R(a t ,x t ) through time, with each return incorporating the costs and benefits corresponding to action a t when the system is in state x t . A value function VðA t jx t ; q t Þ for decision making aggregates returns starting at time t: where the expectation accounts for stochastic transitions among states as well as the structural uncertainty represented by multiple models and their likelihoods.
Step-wise updating of the value function is given by The expression VðA t jx t ; q t Þ serves as a value or objective function by which to compare and contrast the effectiveness of different management strategies.
Learning-based management. Decision making with internal learning as described above characterizes an adaptive approach to management (Williams [29]), whereby adjustments to decision making occur as understanding improves with the ultimate goal of improved management (Walters [34]). Adaptive management is promoted through a sequence of (i) decision making and taking actions, (ii) followed by monitoring of system responses, (iii) followed by assessment of data, (iv) followed by the integration of what is learned into future decision making (Fig 1).
Adaptive management can be either active or passive, with active adaptive management incorporating the potential for learning directly into the process of decision making (Williams [35]). Optimal decision making is given by where the updated model state q tþ1 in V½x tþ1 ; q tþ1 � indicates the use of learning in identification of strategy (see Appendix).
On the other hand, passive adaptive management can be described in terms of the absence of an explicit accounting of learning in the choice of strategy (Williams [35]): where the prior model state q t in V½x tþ1 ; q t � indicates an absence of learning in the identification of decisions (see Appendix). In the development below, the expected value of sample information is described in terms of both active and passive adaptive management.

Combining internal and external learning in EVSI
Under sequential decision making, an approach to the expected value of sample information is to include information internal to the management process as above, along with experimentally generated information from outside the management process. The value obtained can then be compared with the value produced with internal learning only, to assess the net benefit of the experimentation. The learning process in this situation involves updating the prior model state q t to q 0 t with external information as in Eq (2), and then using the update in iterative valuation as in Eq (5).
In combination, external and internal learning can accelerate the rate of learning, by allowing the model state to be updated prior to its use in optimal valuation. Preposterior updating (Berger [36]) with probabilities for the data z t is given by (see Appendix), with the posterior model state q tþ1 based on the prior model state q 0 t . Preposterior updating provides a measurement of value before data z t are known and actions are taken. The expected value of sample information is then expressed as the difference X (2)). The first term in Eq (8) is an average optimal valuation from Eq (7) resulting from the updating of the model state with external data. The second term is an optimal valuation based on the current system and model state. The difference expresses the marginal value expected with new sample information. EVSI can be seen to be state-dependent, in that the value given by the comparison in Eq (8) is conditional on the particular combination ðx t ; q t Þ of system and model states. That is, different combinations of system and model states can produce different values.
The use of passive adaptive management in EVSI proceeds in much the same way, except that the updating of model state in the decision making process involves the use of q 0 t rather than q tþ1 in the valuation: As above, the difference between active and passive adaptive management is the incorporation of anticipated learning in active adaptive management, as reflected in the updated model state q tþ1 in the value term V½x tþ1 ; q tþ1 � in Eq (7).
A simple illustration of the use of internal and external information involves the adaptive management on provincial lands of a particular ecological type, and an investigation under fixed management is also being conducted on a nearby federal conservation area of the same type. Assuming that monitoring and model state updating with Eq (2) occur somewhat earlier on the federal lands, information from the updating can be made available to inform decision making on the provincial lands. If the resource situation at the 2 locations is similar in the biological structures and environmental drivers, then folding what is learned on the federal lands into learning-based decision making on the provincial lands (Eqs (5) and (6)) should increase the rate of learning on the provincial lands, and lead to a more rapid improvement in their management. EVSI at any point in the decision process is simply the comparison of an average valuation

Example: Habitat management for the Florida scrub-jay
The Florida scrub-jay is an endemic species that is designated as threatened under the Endangered Species Act (Root [37], Stith et al. [38]). Scrub-jays are restricted to Florida scrub (hereafter, "scrub"), which is a rare habitat characterized by evergreen, xeromorphic shrubs including oaks, repent palms (Serenoa repens, Sabal etonia), and ericaceous shrubs (Lyonia spp., Vaccinium spp.) (Foster and Schmalzer [39]). Scrub is maintained by frequent fire, and landscape fragmentation and fire suppression have resulted in many scrub communities that are no longer capable of supporting scrub-jay populations (Breininger and Carter [40]). Prescribed burning has thus become the primary management tool in reserves where the viability of scrub-jays and other scrub species is an important objective.
Of the many scrub attributes affecting scrub-jay demography (Breininger et al. [41]), perhaps the most important is scrub height (Breininger et al. [42], Breininger and Carter [40]). Scrub height is classified as short (<120 cm), optimal (120-170 cm), or tall-mix (>170 cm) (Breininger and Carter [40]). Short and optimal height scrub are further classified as open (>50% of the scrub containing bare ground) or closed. Optimal-height scrub acts as a reliable source habitat for jays, whereas the other classes always act as demographic sinks (Breininger and Oddy [43]). The goal of a manager is to maximize the cumulative demographic performance of scrub-jays over time, net the cost of conducting prescribed burns.
For the purposes of this example, we assume a management unit that is homogenous, with one-year transition probabilities for each scrub class along with do-nothing and prescribedburn management actions (S1 File). We also allow for an intensive burn to ensure that the entire management unit is effectively burned. Our null model posits that routine and intensive burns are equally effective (or ineffective) at setting back succession, though an intensive burn is more expensive due to the need to guard against greater threats to infrastructure and public safety. Thus, an intensive burn is never optimal under the null model. The alternative model posits that intensive burns are more effective at setting back succession than routine burns, and thus would be used when their greater short-term cost is offset by greater demographic performance of scrub-jays over the long term. The optimal, actively adaptive policy is depicted in Fig 2, in which the optimal management action is a function of both scrub state (i.e., system state) and the probability of the null model (i.e., model state). The optimal action can be an intensive burn as long as there is at least some probability (� 0.002) of the alternative model being correct. But even in those cases, an intensive (and more expensive) burn is only optimal for the most fire-resistant states (short-closed, optimal-closed, and tall-mix). State and actionspecific transition probabilities and returns, and computational details for the actively adaptive policy are provided in Supporting Information (S1 File).
In this example we assume that data external to the management process are available, and we wish to know the contribution of the external data for improving the management process. Suppose a researcher has the ability to observe the effect of an intensive burn at another site prior to decision making for the management unit in question. We first calculated the Expected Value of Perfect Information (EVPI; Johnson and Williams [44]) (see Appendix), and then calculated EVSI for each combination of scrub state and probability of the null mode according to Eq (9). Some authors (e.g., Walters [34], Moore and McCarthy [45]) have observed that EVPI is often low in practice, which is the case in our scrub-jay example (Fig 3). Expressed as a percentage gain in expected objective value, the value of eliminating model uncertainty is always < 1%. This can actually be good news for a manager, in that there is little incentive to eliminate model uncertainty; a management policy based on an average model may be sufficient. As expected, values of EVPI are considerably higher than those of EVSI, and are at a maximum in the interior of the model state. EVPI is uniformly higher for tall-mix, which is the scrub state most resistant to fire. In contrast, EVSI is uniformly higher for shortclosed, suggesting that experimenting with intensive burns in this scrub state would provide the greatest short-term gain in management performance. However, the advantage of observing a single intensive burn that is external to the management process provides little advantage because both the null and alternative models have broad overlap in their transition probabilities (see Supporting Information), and thus model discrimination is very difficult.

Constraints on the sequencing of monitoring
A somewhat different approach to EVSI with sequential decision making involves monitoring that can be less frequent than decision making. Consider resource management in which actions are chosen annually, whereas monitoring can be conducted either biennially or annually. Under these conditions one can meaningfully assess the value of the additional information produced by annual rather than biennial monitoring. The question is how much value would be added.
To determine the value produced by the additional monitoring, we compare valuation with annual versus biennial monitoring. In any year t, valuation for annual monitoring is given by Eq (4), with optimal valuation shown in Eq (5): Because system status is observed every year, valuation in successive years t and t+1 have the same form, with the value function for t+1 replicating that for year t simply by incrementing the time index by 1: and The situation is somewhat different for biennial monitoring, where the system state is observed in a given year t, not observed in the subsequent year t+1, observed again in year t+2, and so on. Because the observed states x t to x t+2 can be combined with model-specific transition probabilities to determine model state q tþ2 by Bayes' theorem (Williams and Johnson 2017), one to compute a 2-step value function which in turn can be maximized over A t = {a t ,a t+1 ,A t+2 } to produce V½x t ; q t � for each combina- For a year t in which biennial monitoring occurs, the valuation in Eq (10) can be shown to be identical to valuation in Eq (4) for annual monitoring (Williams and Johnson [46]). It follows that there is no difference in value between the monitoring scenarios, i.e., no value is added in switching from biennial to annual monitoring in a year t in which biennial monitoring occurs. On the other hand, for year t+1 when biennial monitoring does not occur, there is a difference in the valuations for annual and biennial monitoring, because x t+1 and q tþ1 are not identified in the latter scenario. However, x t+1 and q tþ1 are related stochastically to x t and q t , which are known through monitoring. Averaging over the transition probabilities � Pðx tþ1 jx t ; a t ; q t Þ produces a valuation for year t+1, and using a � t , a � tþ1 and A � tþ2 from the optimization of VðA t jx t ; q t Þ in Eq (10) produces the optimal valuation for year t+1 (Williams and Johnson [46]). The change in valuation for the 2 monitoring scenarios is therefore given by a comparison of the valuation V½x tþ1 ; q tþ1 � for annual monitoring, and the average valuation � V ½x t ; q t ; a � t � for biennial monitoring: This measure of value, which is directly related to an increase in the frequency of monitoring, can prove useful to managers in determining whether to reduce annual to biennial monitoring, or to expand biennial to annual monitoring.
Example: Habitat monitoring for the Florida scrub-jay The Florida scrub-jay management problem described above can be used to illustrate the effect of an increased monitoring frequency. We calculated actively adaptive management policies for annual and biennial monitoring schemes ( Table 2). The marginal value in Eq (13) varies depending on system and model state; in fact it is negative for some states (McDonald and Smith [47]). Because an average of the optimal values V½x tþ1 ; q tþ1 � is compared against an Value of sample information in structurally uncertain resource systems optimal value for one particular state x t+1 that may be included in that average, Eq (13) may be negative or positive, depending on both the transition probabilities � Pðx tþ1 jx t ; a � t ; q t Þ and the associated optimal valuations in Eq (12). Consider, for example, a system and model state combination ðx tþ1 ; q tþ1 Þ that can be reached from ðx t ; q t Þ. If V½x tþ1 ; q tþ1 � is large but the corresponding probability � Pðx tþ1 jx t ; a t ; q t Þ is small, the comparison in Eq (13) may be positive. On the other hand, a small value V½x tþ1 ; q tþ1 � coupled with a small probability � Pðx tþ1 jx t ; a t ; q t Þ may produce a negative value.
More data from annual monitoring should produce increased value on average over the long term, a result borne out from long-term simulations that account for the likelihood of occurrence for different states (S1 Fig). Nonetheless, the advantage of annual monitoring over biennial monitoring appears to be very small in this example, probably because of the strong relationship between states in successive years. This confirms the intuitive result that there is little to be gained from the frequent monitoring of slowing changing ecosystems.

Discussion
There is a long record of advances in understanding the processes influencing resource dynamics, in modeling resource behaviors, in the recognition of resource patterns, and in methodologies for resource monitoring and estimation. On the other hand, decision making, including a framework for valuation, continues to lag behind natural resources science, despite the growth in operations research and decision science (Schwartz et al. [48]). A technical framework is needed for the evaluation of costs and consequences of resource decisions, so as to allow a comparative assessment of alternative strategies. With such a framework it then becomes possible to assess the limitations of uncertainty on decision making, and the value of eliminating that uncertainty. (2) short-closed; (3) optimal-open; (4) optimal-closed; and (5) tall-mix. Model state q t is the probability of the null model, which posits that an intensive burn is no more effective at restoring optimal height scrub than a routine burn. Optimal actions a � are: (1) do nothing; (2) routine burn; and (3) intensive burn. Sometimes the biennual-monitoring policy a � tþ1 jx t ; q t ; a � t has actions that differ from those for the annual-monitoring policy a � t because in the t+1 years monitoring information is unavailable in the former policy and actions have to be conditioned on the system state, model state, and action for the previous year t.

Model state q t
Annual monitoring Biennial monitoring EVSI Value of sample information in structurally uncertain resource systems In this paper we offer an assessment framework for strategy valuation that builds on adaptive management and the value of information. The general goal is to facilitate the assessment of monitoring in the decision making process, through the consideration of additional value accruing to additional sampling information. The expected value of sample information serves as a metric by which managers can explicitly compare the benefit of extended data collection against associated opportunity and other costs, thereby facilitating smart decision making based on the efficiency of the additional effort. Advances have been made in recent years in the value of information with one-time decision making. In this paper we expand on that work, to address the relatively common occurrence in natural resources of sequential decision making and monitoring over an extended time frame.
In the above treatment of internal and external monitoring we focused on the marginal value of external data collection, on the assumption that it could supplement an ongoing process of internal monitoring. It should be noted that an analogous assessment is possible, whereby external investigation is ongoing and it is internal monitoring that is considered to be supplemental to it. Framing the issue in this way would allow managers to consider whether to implement (or continue) internal monitoring as part of the management process based on the marginal value of doing so, or to rely on externally collected data only.
As to the cadence of monitoring, we note that it is possible to extend the period between monitoring events so that monitoring occurs less frequently than every other year. Consider the prospect of triennial monitoring, in which a monitoring effort is mounted every 3 years. A computing form for valuation would mirror that shown above, except it would need to account for state transitions over 3 years. Again, the valuations for annual and triennial monitoring would be equivalent for years in which monitoring occurs, but would differ in years when there is no monitoring. However, there would be different valuations for the non-monitoring years, leading to a differential value-added for annual monitoring that would depend on the out-year under consideration.
When using EVSI to explore the value of additional information to resolve uncertainty, it is important not to misinterpret results (Johnson et al. [49]). One such misinterpretation is to conclude that a low value of EVSI means monitoring is unneeded. As indicated above, EVSI is a comparison of an average of optimal values produced with additional sample information, versus the optimal value that is attainable in the absence of additional information (Eq 10). As such it is effectively a marginal analysis, addressing the value of additional monitoring that contributes to an ongoing if imperfect monitoring effort that informs decision making. Monitoring is required for the state-based information on which the optimal resource decision making depends, and the question here is whether additional monitoring is justified by the potential increase in value that would be produced. A decision to increase or decrease the monitoring effort relies on the answer to this question. Whether to terminate monitoring altogether is a quite different question, one that is not addressed by examining the effect of a marginal change in monitoring effort (Williams and Johnson [24]).
Finally, we emphasize that as potentially useful as the value of information is, and in particular EVSI, these metrics only partially characterize the benefit to be derived from the decision framework presented above. Management objectives, potential actions, sources of uncertainty, and forecasts of resource responses provide a decision making "architecture" for post-decision monitoring and assessment that can track resource responses and evaluate progress toward objectives. A technical assessment of the value of the information produced can certainly contribute in informing management. However, the metrics are certainly not the only, and possibly not even the most relevant, measures of value for the decision framework. Among other things, a systematic and structured accounting of the elements of decision making can facilitate collaboration and shared decision making, lowering the potential for contentiousness and conflict among stakeholders (Nichols et. al [50]). The value of information can certainly contribute to, but should not obscure, these and other benefits accruing to a structured process of decision making.

Appendix
• We first consider optimal valuation with internal monitoring. Action taken at each time maximizes the sum of current return and expected future value. Two decision making approaches are active adaptive management and passive adaptive management, and strategy valuation applies to both. Active adaptive management Expected future value is based on updated model state q tþ1 : Passive adaptive management Expected future value is based on current model state q t : The expected value of perfect information (EVPI) can be calculated with either approach. EVPI compares the average optimal valuation, assuming complete understanding, against optimal valuation under structural uncertainty: EVPI ¼ X k q t ðkÞV k ½x t � À V½x t ; q t �: EVPI is necessarily non-negative (Williams and Johnson 2015b). • Next we consider optimal valuation with internal and external monitoring. Here we utilize preposterior averaging of optimal adaptive valuations: Step 1. Update q t to q 0 t using external data z t as in Eq (2).
Step 2. Use q 0 t in the optimal valuation in Eq (5).
Step 3. Average the optimal valuations in step 2 over the data z t that produce q Rða t ; x t Þ þ