Adaptive Sampling of Information in Perceptual Decision-Making

In many perceptual and cognitive decision-making problems, humans sample multiple noisy information sources serially, and integrate the sampled information to make an overall decision. We derive the optimal decision procedure for two-alternative choice tasks in which the different options are sampled one at a time, sources vary in the quality of the information they provide, and the available time is fixed. To maximize accuracy, the optimal observer allocates time to sampling different information sources in proportion to their noise levels. We tested human observers in a corresponding perceptual decision-making task. Observers compared the direction of two random dot motion patterns that were triggered only when fixated. Observers allocated more time to the noisier pattern, in a manner that correlated with their sensory uncertainty about the direction of the patterns. There were several differences between the optimal observer predictions and human behaviour. These differences point to a number of other factors, beyond the quality of the currently available sources of information, that influences the sampling strategy.


Ideal Observer Derivation
In this section, the derivation of a likelihood-ratio hypothesis test for determining which of two normally distributed populations has the greatest mean is presented. The basis for this test stems from the work of Flehinger et al. [1,2] who proposed a likelihood ratio hypothesis test for comparing treatments in medical trials in which the data from the two treatments is represented by two normally distributed populations with common variance σ 2 . Subsequent contributions made to this body of work [3,4,5] provided generalisations to the likelihood ratio test, as well as bounds on the performance achieved in a variety of scenarios. One such generalisation, particularly relevant to the comparative two-alternative decision making problem outlined in the main text, was made by Hayre and Gittins [4] who generalised the hypothesis test, allowing the alternatives, X and Y , which are under comparison to have unequal variances denoted σ 2 x and σ 2 y respectively. With no assumption of common variance made in the formulation of the comparative decision problem, the likelihood ratio test derived herein is based upon the subsequent work of Hayre and Gittins [4].
As stated in the main text the decision maker is presented with two sources of stimuli, the neural response to which (the firing rate) is represented by two normally distributed populations X and Y from which two sequences of observations, x m = {x 1 , . . . x m } and y n = {y 1 , . . . , y n } (the firing rates), are drawn. Where, each x i and y j is independent and normally distributed such that x i ∼ N (µ x , σ 2 x ) and y j ∼ N (µ y , σ 2 y ). Stated simply, the goal of the decision maker is to determine which of the two populations has the larger mean value (µ High ). With two alternatives there are two possible states of nature (assuming µ x = µ y ), either µ x > µ y or µ y > µ x which can be described by two hypotheses H x and H y , shown below: Where, θ is the average of the means of the two normally distributed variables, an unknown quantity, and 2δ is the difference in the means of the two variables, which is assumed to be known.
With the decision problem formulated as a choice between two simple hypotheses, H x and H y , on the basis of a fixed total number of observations, the Likelihood Ratio Test δ LLR (x m , y n ), can be formulated as follows [8]: Where, e η is the decision boundary and Λ(x m , y n ), shown below, is the ratio of the likelihood of the observations x m and y n under the two hypotheses, H x and H y and e η is the selected decision threshold.
Since the likelihood ratio stated above depends on the unknown parameter θ it is necessary to calculate a marginalised likelihood ratio which is independent of this parameter. To achieve this without making any additional assumptions it is necessary to introduce two new sets of variables u m = {u 2 , . . . u m } and v n = {v 1 , . . . v n }, which are related to the x i and y j variables as shown below: Although the variables x 1 , u i and v j are all functions of a linear combination of normally distributed random variables they are all dependent on the value x 1 and as such are no longer independent of one another. Thus, collectively the probability distrubution over the variables is given by the multivariate normal distribution [7]: Where, x is a vector containing the variables x 1 , u 2 . . . u m and v 1 . . . v n , µ Hi is a vector containing the means of the variables in x under hypothesis H i and Σ is the covariance matrix of the variables, as shown in Equations 1.3 and 1.4 respectively.
Where µ x|Hi , µ u|Hi and µ v|Hi are the mean values of the variables x, u and v under hypothesis H i and the elements of the covariance matrix Σ are listed below and shown in matrix form in Equation 1.4 below.
Cov (x 1 , Populating the covariance matrix Σ with these terms yields: To calculate the inverse of the covariance matrix (Equation 1.5) it is necessary to perform a decomposition operation first, which yields two triangular matrices, a lower triangular matrix L and an upper triangular matrix U , the product of which yields the original covariance matrix Σ = LU [9].
Where, the large off-diagonal zeros indicate that all remaining elements in the matrices L and U for which a value has not been specified all have value 0.
With the covariance matrix decomposed, the inverse of the matrix can be calculted in two stages. In the first step the inverse of L is calculated. Following this, the inverse of Σ is given by finding the matrix Σ −1 which satisfies U Σ −1 = L −1 [9].
Starting with the lower triangular matrix L, it can be seen that the matrix is an atomic-triangular matrix, a special form of triangular matrix in which all off-diagonal elements are zero with the exception of a single column which contains non-zero values.
The inverse of such a matrix is found by simply replacing the non-zero off-diagonal elements with their additive inverse. Substituting the result into U Σ −1 = L −1 yields the following result: With the exception of the first row of the matrix U , all the remaining rows contain only a single non-zero element. As such, calculation of the inverse of matrix Σ −1 can be performed with relative ease through a process of backwards substitution yielding: Substituting these terms into the exponent of the multivariate normal distribution (Equation 1.2) the following result is obtained: Next, setting µ u = 0, as this is true regardless of hypothesis, and rearranging to collect terms of x 1 the following is obtained: Following this, using the definite integral a +c , the parameter x 1 can be integrated out of the likelihood function, yielding a marginal likelihood of e f H i , the exponent of which is shown below: Where, µ v = µ x − µ y is equal to 2δ when hypothesis H x is the true state of nature, and −2δ when hypothesis H y is the true state of nature. Thus, with µ v = ±2δ depending on the hypothesis, the majority of terms in f Hi cancel when calculating the log-likelihood ratio ln Λ(x m , y n ) = f Hx − f Hy .
Returning to the original variables x i and y j , the log-likelihood ratio can be rewritten in the form used in the main text: Finally, the decision function δ(x m , y n ) from Equation 1.1 can be rewritten in terms of the log-likelihood function, LLR(x m , y n ), as shown below: Where the decision boundary η controls for the level of bias inherent in the decision rule. With the log-likelihood ratio, a decision boundary of η = 0.0 corresponds to an unbiased decision rule, in which the decision maker accepts the hypothesis under which the observations were most likely to have occurred. With no prior knowledge of the likelihood of the two hypotheses H x and H y , a decision boundary of η = 0 shall be used for the ideal observer.

Ideal Observer -Error Rates and Optimal Sampling Allocation
In the previous section a decision function was presented for selecting which of two normally distributed populations has the greatest mean. This decision function, a log-likelihood ratio test, provides the optimal solution to the decision problem for any given decision boundary and set of observations [8].
In this section the effect the sampling strategy, q, has on the expected error rate of the ideal observer is considered. To begin, in section 2.1, the expected error rate of the ideal observer decision maker is derived as a function of the decision parameters and the sampling strategy q. Next, in section 2.2, this expected error rate is used to determine the sampling strategy which minimises the error rate.

Expected Error Rate
As outlined in Section 1, the ideal observer utilises the Log-Likelihood Ratio Test (Log-LRT) δ LLR (x m , y n ) to determine which of the two available hypotheses H x or H y to accept.
Where, LLR(x m , y n ) is the Log-Likelihood Ratio (LLR), the value of which can be calculated from the observations x m and y n as follows: Thus, at a given interrogation time T , the probability of making an error is given by the integral over the distribution of LLR between −∞ and 0 if H x is the correct hypothesis and between 0 and ∞ if H y is the correct hypothesis. This is shown below in Equation 2.2, where ER Hx and ER Hy are used as short hand to denote the error rates under each of the hypotheses for a given parameterisation of the problem: With LLR(x m , y n ) given by a linear combination of the observations from the normally distributed random variables X and Y , the LLR is itself a normally distributed random variable. Where the mean of LLR, denoted E(LLR), can be calculated using the following expression: Where, (µ x − µ y ) is equal to 2δ when hypothesis H x is the true state of nature, and −2δ when hypothesis H y is the true state of nature; thus, Similarly, the variance of LLR, denoted Var(LLR), can be calculated using the following expression: With the expressions for the mean and variance of the Log-LLR at the decision time T , the decision functions error rates (Equation 2.3) can be stated in terms standard normal Cumulative Distrubution Function (CDF), as shown below: Where, the argument of the normal CDF function, −E(LLR|Hi) √

Optimal Allocation of Samples
Now, with the expected error rate of the ideal observer derived, the relationship between this error rate and the chosen sampling strategy q can be considered.
To achieve this, it is first noted that Φ (.) is a monotonically increasing function of its argument and, as such, the minima and maxima of the function coincides with the minimum and maximum extremum of the argument, respectively. Considering the two error rates, ER Hx and ER Hy individually, it can be seen that ER Hx is minimised when is minimised and ER Hx is minimised when is maximised.
and the max- coincide. Thus, any sampling strategy that minimises the error rate under one hypothesis also minimises the error rate under the other hypothesis.

Var(LLR)
, the result of which is shown below, the optimal sampling strategy q * can be found as follows: If stationary points in the interval [0, 1] are found and one or more of them is a minimum extremum then by comparing the error rate at each point the optimal strategy q * can be found. However, if no minimum extremum exists in the range [0, 1], then the optimal strategy shall lie at either q * = 0 or q * = 1, with further analysis required. With that noted the derivative can be simplified by the multiplying out the denominator of the two terms on the Right-Hand Side (RHS) of Equation 2.8: From the equation above it appears the DV has critical points at the following q values: q = 0, q = 1, q = σx σx+σy and q = σx σx−σy . To determine which, if any, of the points correspond to valid sampling strategies and a minimum extremum of the DV requires further analysis. Starting with the points q = 0 and q = 1 mentioned earlier, it can be seen by substituting these values back into Equation 2.8 that these points are not in fact stationary points but correspond to vertical tangents where the derivative is Next, from inspection of the point q = σx σx−σy it can be seen that the point lies outside of the interval of valid sampling strategies [0, 1], leaving only the point q = σx σx+σy , which does correspond to a valid sampling strategy for any σ x and σ y .
To check if the point q = σx σx+σy is a maxima, minima or a point of inflection requires the evaluation of the second derivative at q = σx σx+σy , with d 2 the following result is obtained: Substituting q = σx σx+σy into this result and simplying yields the following result: Inspection of this result reveals that the second derivative of −E(LLR) √

Var(LLR)
at the point q = σx σx+σy is always positive. Thus, the point q = σx σx+σy corresponds to a minimum extremum and the optimal sampling strategy under either hypothesis is q * = σx σx+σy .

Ideal Observer Model -Switching Costs
In Section 2, when the solution to the decision problem as seen by the ideal observer was derived, it was assumed that the full decision time T was spent observing one or other of the two available sources of stimuli. A consequence of this assumption is that the decision maker must have the ability to instantaneously switch sampling between the two sources of stimuli.
In decisions involving visual stimuli, eye movements form an essential part of how sensory and cognitive attention is controlled during information gathering. For the participants tackling the visual discrimination task outlined in the main text, completion of these eye movements is clearly not instantaneous. Moreover, peri-saccadic suppression is likely to prolong the period in which vision is attenuated beyond the duration of the movement itself [10]. To understand the effect these non-instantaneous switches in fixation have on the behaviour of the decision maker a switching cost, in the form of a loss of sampling time, is introduced into the ideal observer model. This section investigates the effect the switching cost has on the expected behaviour and performance of the ideal observer decision maker.
If we begin by denoting the loss in sampling time by T c and the number of switches by n s then we can reformulate the log likelihood equation, LLR T , as shown below.
Where, q represents the portion of the utilised sampling time T − n s T c spent drawning observations from alternative X with the total number of discrete samples drawn by the ideal observer from alternative X given by m = q(T − n s T c ) and the total number of samples drawn from alternative Y is given by n = (1 − q)(T − n s T c ).
As before (Equation 2.6), the expected error rate for the modified ideal observer can be calculated from the standard normal's CDF, as shown below.
Where the argument of the Φ(.) is given as follows: As noted previously in Section 2.2, the expected error rate is minimised under H x when the argument of Φ(.) has minimum value and is minimised under H y when the argument of Φ(.) has maximum value. Inspection of Equation 3.3 shows that increasing the value of n s leads to a increase in the expected error rate achieved under both hypothesis H x and H y . Next, we note an additional property of introducing switching costs, which is that for values of n s = 0 the sampling proportion q must be equal to either 0 or 1, as a switch must occur in order for samples to be drawn from both of the alternatives. Thus, from Equations 3.2 and 3.3, we can see that at n s = 0 η−E(LLR|Hi) √

Var(LLR)
= 0 and an expected error rate of 50% is achieved. Further inspection shows that whenever T > T c , setting n s = 1 and having a 0 < q < 1 leads to a reduction in error rate compared to that achieved when n s = 0.
Next, to determine the optimal sampling strategy to be used for a given problem parameterisation, we note that the introduction of a switching penalty has merely resulted in a reduction the amount of effective sampling time available. Since the optimal sampling strategy, q = σx σx+σy , derived in the previous section is independent of sampling time and depends only on the values of σ x and σ y we can conclude that the introduction of a switching penalty has no effect on how the available sampling time should be apportioned between the variables.
From which we can conclude that, in the situation when values of σ x and σ y are known and the switching cost satisfies the inequality T c < T , the expected error rate is minimised by selecting n s = 1 and q = σx σx+σy .
Relating this optimal one-switch strategy back to the number of samples, m and n, drawn from each of the two variables X and Y we find that m = q(T − T c ), n = (1 − q)(T − T c ) and T = m + n + T c .

Ideal Observer -Unknown Variance
In the previous section it was noted that the ideal observer model made an implicit assumption that the decision maker is capable of instantaneously switching fixation between the two available sources of stimuli. As noted, this assumption is unrealistic for the human participants solving visual discrimination tasks. Thus, a switching cost, in the form of a loss of useable sampling time for each switch effected, was incorporated into the ideal observer model.
In this section another assumption made about the decision makers is considered, namely that of known stimuli noise. Recall that the optimal strategy q = σx σx+σy depends on relative noise levels of the stimuli. From a theoretical standpoint, optimal solutions do not exist for decisions requiring inferences about the means of two normal populations when the variances of each population are not necessarily equal and the variances themselves (or their ratio) are unknown [6]. Therefore, in order to simplify the problem we consider an ideal case in which there are two variances σ 2 low and σ 2 high (as in the visual discrimination task described in the main text) and the decision making agent is an ideal observer, which after a brief sampling period, is able to classify the information source as low or high noise. Furthermore, it is assumed that the two variances are equally likely for each of the alternatives such that the 4 possible combinations with which the variances of the two sources can be drawn from {σ 2 low , σ 2 high } are all equally likely.
Under such a formulation, the challenge of the agent then becomes how to apportion the available sampling time between the two alternatives, given that the variance of the second alternative will be unknown at the time the switch is made. In order to further simplify the task we shall begin by assuming that the agent makes only a single switch between the two alternatives; later, in Section 4.2, we will consider under what conditions the agent's performance would be best served by making multiple switches.

Single Switch
Given this formulisation of the unknown variance case and assuming, without loss of generality that X is initially fixated upon such that σ x is known σ y unknown, the expected error rate of the decision maker can be written as follows: Where, P Y (σ 2 low ) and P Y (σ 2 high ) are the prior probabilities that σ 2 y = σ 2 low and σ 2 y = σ 2 high respectively.
With the expected error rate dependent on the value of σ x , there are two distinct error rates, which shall be denoted ER(q, T ) |σx=σ low and ER(q, T ) |σx=σ high . Considering the case σ 2 x = σ 2 low first, the value of q which yields the smallest expected error rate can be found by taking the derivative of Equation 4.1, shown below, and solving for stationary points.
Unfortunately, it is not possible to determine analytically the values of q which correspond to the stationary points of the function. However, although we cannot determine the exact value of q which corresponds to the minimum extremum of the error function we can be sure that it's value lies between the optimal value for the cases σ y = σ low and σ y = σ high , as shown below.
This inequality follows from the analysis of the ideal observers error rate in Section 2, where it was shown that the error rate for the known variance case has a single turning point within the acceptable range of sampling strategies (0 ≤ q ≤ 1). With that point (q = σx σx+σy ) corresponding to a minimum of the error rate, deviations in sampling strategy from q = σx σx+σy within the range 0 ≤ q ≤ 1 result in an increase in error rate with larger deviations yielding a larger increases in error rate.
Thus, as q approaches the lower bound of the inequality from below (q < σ low σ low +σ high ), the argument of the normal CDF function ( −E(LLR|Hi) √

Var(LLR)
) and thus the error rate for both σ 2 y = σ 2 low and σ 2 y = σ 2 high , will be decreasing. Conversely, as q moves past the upper bound (q > σ low σ low +σ low ) the argument and error rate for both σ 2 y = σ 2 low and σ 2 y = σ 2 high , will be increasing. As the value of q is varied from the lower bound to the upper bound the argument and error rate for increase for σ 2 y = σ 2 low and decrease for σ 2 y = σ 2 low . Thus, since the expected error rate for the unknown variance case is a weighted average of the error rates for the two known variance cases, there exists a point between these two bounds at which the expected overall error rate is minimised.
Similarly to the above, when σ 2 x = σ high the region containing the optimal strategy q * is bounded as follows: In Figure 4.1(b) the two regions containing the optimal sampling strategy for the conditions of σ 2 x = σ low and σ 2 x = σ high are illustrated along with the optimal strategies for the four known variances cases (Figure 4.1(a)) for comparison. 3 it can be seen that the regions of the sampling space containing the optimal sampling strategy under the two conditions σ 2 x = σ low and σ 2 x = σ high do not overlap. Thus, for σ low = σ high , the optimal sampling strategy cannot be determined until X has been sampled and the value of σ x is known.

Multiple Switches
In the previous section the effect unknown variances has on the optimal sampling strategy when the decision maker is restricted to making only a single switch between the sources of stimuli was investigated. Under such conditions it was shown that, whilst the optimal strategy itself cannot be found analytically, it does lie between the bounds outlined below: In this section we consider under what conditions the decision maker should choose to make additional switches in fixation given that the chosen switching time lies between the bounds identified above. In addition, the influence the switching cost has on the switching frequency is considered.
Assuming that the initial switch has already been effected and the variance of both alternatives are known to the decision maker, a second switch will be beneficial if the expected error rate with the second switch is less than without. Denoting the useable sampling time and sampling strategy by T 1 and q 1 for the single switch case and by T 2 and q 2 for the two switch strategy this can be written as follows: In addition to the previous assumptions that have been stated it can also be assumed that T 2 ≤ T 1 with equality between the sampling times when there is no switching cost (T c = 0). Furthermore, since switching back to alternative X can only increase the number of samples drawn from X, it holds that q 2 T 2 > q 1 T 1 (number of samples from X given by m = qT ) thus, since T 2 ≤ T 1 , q 2 must be strictly larger than q 1 . Now, by considering the four possible combinations of variance individually, Equation 4.5, and the benefit of an additional switch, can be evaluated. Starting with σ x = σ low and σ y = σ high we know, from Section 3, that the optimal sampling strategy q * = σ low σ low +σ high . As noted in the previous section, deviations in sampling strategy from q * = σx σx+σy within the range 0 ≤ q ≤ 1 result in an increase in error rate with a larger deviations yielding a larger error rate increase. Thus, since it has been assumed that the switching time q 1 lies within the bounds in Equation 4.4 and since q 1 < q 2 , even if T c = 0, additional switches can only serve to increase the error rate. Likewise, with σ x = σ high and σ y = σ high the optimal sampling strategy again corresponds to the lower bound with q * = σ low σ low +σ high and additional switches can only serve to increase the error rate.
Conversely, when presented with the combinations of noise of σ x = σ low and σ y = σ low or σ x = σ high and σ y = σ low if the first switch strategy lies below the upper bound outlined in Equation 4.4 then an additional switch (increase portion of time allocated to alternative X) could serve to improve the performance of the decision maker. Determining exactly when a second switch is beneficial requires further analysis of the inequality in Equation 4.5. Before considering two remaining variance combinations individually it is first noted that, since the error rates, under both H x and H y , are monotonically increasing functions of −E(LLR|Hi) √

Var(LLR)
, an equivalent inequality for deter-mining the efficiacy of an additional switch is given as follows: Assuming that it is beneficial to the decision maker to make a second switch then, from Section 2, it is clear that the decision maker will minimise their error rate with total sampling time T 2 by allocating samples such that q 2 = σx σx+σy . Given this assumption and the simplified inequality above, the conditions under which an additional switch is beneficial can be considered for the two remaining combinations of variance.
Starting with the case when σ x = σ low and σ y = σ low and substituting in the optimal 2-switch sampling allocation of q 2 = σ low σ low +σ high the inequality simplifies as follows: Since 4(1 − q 1 )q 1 lies in the interval [0, 1] it can be seen that with no switching cost (T 2 = T 1 ) the inequality is satisfied and an additional switch is always benefical and as the switching cost grows the inequality will eventually fail to be satisfied and a further switch becomes detrimental to the error rate.
Finally, considering the case with σ x = σ high and σ y = σ low and substituting in the optimal 2-switch sampling allocation of q 2 = σ high σ high +σ low the inequality simplifies as follows: In summary, if the second sampled alternative is the more noisy one, i.e. σ y = σ high , the accuracy is maximized by a single switch, but if σ y = σ low , an additional switch may improve accuracy under certain conditions.

DV Decision Maker -Error Rates and Optimal Sampling Allocation
In the main text it was shown how the log-likelihood ratio used in the ideal observer model (derived in Section 1) could be simplified to yield a decision variable (DV). Importantly, this simplified model was shown to have equivalent performance to the ideal observer model when particular sampling strategies were utilised.
In this section the expected error rate for the DV decision maker and the relationship between this error rate and the sampling strategy used is considered. To begin, in section 5.1 the expected error rate for the DV decision maker is derived. Next, in section 5.2 this error rate is related to that of the ideal observer. Finally, in section 5.3 the effect deviations from the optimal sampling strategy has on the performance of the decision maker is considered.

Expected Error Rate
In this section an expression is derived giving the expected error rate of the DV decision maker in terms of the decision problem parameters and the free parameter q which dictates the sampling strategy. As outlined in the main text, the DV decision maker utilises the decision function δ DV (x m , y n ) to determine which of the two available hypotheses H x or H y to accept: Throughout this section DV(x m , y n ) and DV shall be used interchangeably to refer to the value of the decision variable having made the observations x m and y n ; which can be calculated as follows: Where,x andx are the mean values of the observations made from sources X and Y respectively and w =q 1−q , withq representing the assumed or planned sampling strategy.
Thus, as with the optimal observer error rate, at a given interrogation time T , the probability of making an error is given by the integral over the distribution of DV between −∞ and 0 if H x is the correct hypothesis and between 0 and ∞ if H y is the correct hypothesis. This is shown below in Equation 5.2, where ER Hx and ER Hy are used as short hand to denote the error rates under each of the hypotheses for a given parameterisation of the problem: Again, as with the optimal observer, the decision variable, like the LLR, is given by a linear combination of the observations from the normally distributed random variables X and Y . Thus the decision variable is itself a normally distributed random variable.
Denoting the expected value of DV by E[DV] and the variance by Var(DV), they can each be calculated from the decision problem parameters as follows: With the expressions for the mean and variance of the DV at the decision time T formulated, the DV error rates (Equation 5.2) can be stated in terms of the standard normal CDF, as shown below: Where, the argument of the normal CDF function, −E(DV|Hi) √

Var(Z T )
, is shown below: In the next section this result and the equivalent result for the ideal observer (Equation 2.7) shall be compared to establish how the error rate of the DV decision maker relates to that of the optimal ideal observer decision maker and to determine how the DV decision maker should allocate their samples to minimise the expected error rate.

Relationship to Ideal Observer
In the previous section an expression was derived giving the expected error rate of the DV decision maker in terms of the decision problem parameters and the free parameter q which dictates the sampling strategy. In this section the DV error rate expression along with the stages used to derive the DV decision maker from the ideal observer decision maker are used to relate the performance of the decision makers and determine the optimal sampling allocation the DV decision maker should utilise.
In the main text it was shown that when the DV decision maker implements the planned sampling strategy ofq (q =q) the ideal observer and DV decision maker behave identically. With the error rates of the ideal observer (Equation 2.6) and DV decision maker (Equation 5. . Inspection of these functions, which are shown below and examples of which are plotted in Figure 5.1, reveals that as expected the error rates intersect when the DV decision maker implements the planned sampling strategy ofq −E(DV) As discussed previously, the log-likelihood ratio used by the ideal observer is the optimal decision strategy for two-alternative decisions made with a specified decision threshold (η) on the basis of a fixed number of observations [8]. Given the optimality of the ideal observer and the relationship between the error rates of the ideal observer and DV decision maker identified above it can be seen that DV decision maker's performance is maximised when implementing the planned sampling strategyq. Furthermore, since q * = σx σx+σy yields the optimal performance for the ideal observer, planning and implementing a sampling strategy of q =q = q * , will also yield the optimal error rates for the DV decision maker as well.

Deviation from Planned Strategy
In the previous section it was shown that the DV decision maker and ideal observer are equivalent whenever the planned sampling strategy,q, coincides with the actual sampling strategy q. Furthermore, since a sampling strategy of q * = σx σx+σy was shown to minimise the error rate for the ideal observer and since the ideal observer provides the optimal solution to the decision problem [8] this strategy also minimises the error rate for the DV decision maker whenq = q = σx σx+σy . In this section the performance of the DV decision maker when deviating from the planned sampling strategy is considered.
As shown in Section 5.1 the DV decision maker's error rate, like the ideal observer's, is a function of standard normals cumulative probability distribution (Φ (.)):  decreases as the the q value is increased. Thus it appears the effect of deviating from the optimal sampling strategy q * = σx σx+σy is to to bias the decision maker towards selecting the source from which more samples were drawn than stipulated by q * .
To determine if this relationship holds for all decision problems, and not just the specific cases in Figure 5 Var(DV) Multiplying out the denominators and simplifying yields: x − w 2 σ 2 y ) = 2(µ X + wµ Y )(qσ 2 x + w 2 (1 − q)σ 2 y ) Rearranging, the sampling strategy q can be written in terms of the remaining parameters: wµ y σ 2 x + 2w 2 µ x σ 2 y + w 3 µ y σ 2 y (µ x + wµ y ) > q w 2 σ 2 y − σ 2 x Simplifying further there are three distinct cases that must be handled depending on the value of w 2 σ 2 y − σ 2 x . First, when w 2 σ 2 y = σ 2 x the RHS of the above reduces to 0 yielding: wµ y σ 2 x + 2w 2 µ x σ 2 y + w 3 µ y σ 2 y (µ x + wµ y ) > 0 Next, when w 2 σ 2 y > σ 2 x division by w 2 σ 2 y − σ 2 x yields the following result: wµ y σ 2 x + 2w 2 µ x σ 2 y + w 3 µ y σ 2 y (µ x + wµ y ) w 2 σ 2 y − σ 2 x > 1 > q Finally, when w 2 σ 2 y < σ 2 x division by w 2 σ 2 y − σ 2 x yields the following result: q > −1 > wµ y σ 2 x + 2w 2 µ x σ 2 y + w 3 µ y σ 2 y (µ x + wµ y ) w 2 σ 2 y − σ 2 x With w, µ x , µ y , σ 2 x and σ 2 y all strictly positive, it can be seen that all three of these inequalities are necessarily satisfied. Since all of these cases completely encompass the interval q ∈ [0, 1], all valid sampling strategies yield a negative derivative, regardless of hypothesis. Thus the effect of deviating from the optimal sampling strategy q * = σx σx+σy is to bias the decision maker towards selecting the source from which more samples were drawn than stipulated by q * .