Factors Limiting Performance in a Multitone Intensity-Discrimination Task: Disentangling Non-Optimal Decision Weights and Increased Internal Noise

To identify factors limiting performance in multitone intensity discrimination, we presented sequences of five pure tones alternating in level between loud (85 dB SPL) and soft (30, 55, or 80 dB SPL). In the “overall-intensity task”, listeners detected a level increment on all of the five tones. In the “masking task”, the level increment was imposed only on the soft tones, rendering the soft tones targets and loud tones task-irrelevant maskers. Decision weights quantifying the importance of the five tone levels for the decision were estimated using methods of molecular psychophysics. Compatible with previous studies, listeners placed higher weights on the loud tones than on the soft tones in the overall-intensity condition. In the masking task, the decisions were systematically influenced by the to-be-ignored loud tones (maskers). Using a maximum-likelihood technique, we estimated the internal noise variance and tested whether the internal noise was higher in the alternating-level five-tone sequences than in sequences presenting only the soft or only the loud tones. For the overall-intensity task, we found no evidence for increased internal noise, but listeners applied suboptimal decision weights. These results are compatible with the hypothesis that the presence of the loud tones does not impair the precision of the representation of the intensity of the soft tones available at the decision stage, but that this information is not used in an optimal fashion due to a difficulty in attending to the soft tones. For the masking task, in some cases our data indicated an increase in internal noise. Additionally, listeners applied suboptimal decision weights. The maximum-likelihood analyses we developed should also be useful for other tasks or other sensory modalities.


Introduction
In experiments on multitone intensity discrimination (also termed multitone pattern discrimination), the stimulus consists of several sequentially presented elements and the task is to judge the intensity of this multitone stimulus (e.g., [1,2]). The focus of most of these experiments has been on estimating the relative reliance or decision weight listeners give to different tones in the pattern, by using so-called molecular psychophysical analyses (cf. [3,4,5,6]), also known as perceptual weight analysis or psychophysical reverse correlation.
Previous research showed that listeners do not make optimal use of the intensity information available in a multitone pattern when the tones differ in sound pressure level [7]. The aim of this study was to identify sensory and decisional factors contributing to this behavior. Lutfi and Jesteadt [7] conducted a multitone intensity discrimination experiment where each stimulus consisted of five sequentially presented tones. The tones alternated in level, so that for example the mean level of the first tone was 80 dB SPL, the mean level of the second tone was 40 dB SPL, and the mean level of the third tone was again 80 dB SPL (see Fig. 1 for a similar stimulus configuration). The task of the listener was to detect a level increment imposed on all of the five tones. The levels of the five tones were randomly perturbed, and the random perturbation was used to estimate decision weights for each of the five tones (cf. [4,6]). The listeners placed much higher weight on the loud tones than on the soft tones, even if the level information from the soft tones was rendered more reliable by presenting a larger level increment on the soft than on the loud tones. This pattern of weights is consistent with other reports of a ''level dominance'' effect [8,9]. For instance, it has been demonstrated that the temporal level profile of longer, level-fluctuating sounds strongly influences the pattern of perceptual weights assigned to individual temporal portions of the sounds when listeners judge the overall intensity. For sounds beginning with a gradual increase in level, the weights assigned to the attenuated (''fade in'') part are close to zero [2]. For sounds beginning with a gradual decrease in level, on the other hand, the first part of the sound receives the highest weight [5]. Both observations are compatible with attention to the loudest elements [8]. Notably, in an additional experiment by Lutfi and Jesteadt [7] where the loud elements were wideband noise bursts rather than pure tones, listeners placed the highest weights on the more reliable soft tones. Lutfi and Jesteadt interpreted these data as to showing that the spectral difference between soft and loud sounds facilitated the direction of attention to the soft elements.
Do the different weights assigned to the soft and the loud tones in the multitone stimulus thus represent a purely ''cognitive'' effect in the sense of a suboptimal decision strategy? The motivation for the present study was that there is an alternative explanation, and our aim was to decide between the two explanations. Lutfi and Jesteadt [7] assumed that the intensity resolution for the soft and the loud tones did not differ dramatically. More specifically, they assumed the intensity resolution for each tone embedded in the multitone sequence to be identical to intensity resolution in quiet, where the near-miss to Weber's law [10] describes the slightly higher intensity resolution for tones with a higher sound pressure level. However, the intense tones could have acted as nonsimultaneous maskers for the softer tones. As described above, in one condition studied by Lutfi and Jesteadt, the mean level of the loud tones was 80 dB SPL, the mean level of the soft tones was 40 dB SPL, and the ISI between for example the first (loud) tone and the second (soft) tone was 100 ms (cf. Fig. 1). This configuration is similar to the stimuli presented in a forwardmasked intensity-discrimination task (see Fig. 1 in [11]), where a strong masker-induced elevation of the intensity difference limens for the 40 dB SPL tones would be expected if combined with an 80 dB SPL masker at the described temporal configuration [12]. Thus, if non-simultaneous masking played a role, then the intensity difference limen (DL) for the 40 dB SPL tone might have been 5-20 dB higher than for the 80 dB SPL tone (e.g., [11]). A strong DL elevation would have rendered the level increment of 6 dB placed on the soft tones in the experiment by Lutfi and Jesteadt virtually undetectable. In such a case, the optimal strategy would be to place only small weight on the soft tones. Put differently, the listener should rely mainly on level information from the loud tones, even if he or she was in principle able to attend to the soft tones. The observation that the soft tones received higher weight if the loud tones were replaced by noise bursts at the same sound pressure level is also compatible with an explanation based on intensity resolution because the wideband noise maskers contained much less energy in the auditory filter centered at the signal frequency, and can therefore be assumed to have had a smaller impact on intensity resolution for the target tones [12].
On a more general level, both sensory and decisional (cognitive) factors may have contributed to the weighting pattern observed by Lutfi and Jesteadt [7] in the multitone paradigm. We conceptualize the performance in the multitone intensity discrimination task in terms of a very simple processing model. In the first stage (sensory processing), each tone is processed by the auditory system, resulting in a representation of the intensity of each tone. These representations are stored in a memory system. The second stage (decision stage) then combines the level representations and outputs a decision according to some decision rule. For reasons of simplicity, we conceive the combination of information to be lossless, i.e., in the signal detection theory tradition there is no decision noise. Now, the representation of the intensity of the soft tones available at the decision stage could be impaired by the presence of the loud Figure 1. Trial configuration in the multitone intensity-discrimination task with alternating loud and soft tones. Panel A shows the overall-intensity task. Each trial contained three loud tones (L) and two soft tones (S). The level of each tone was sampled independently from a normal distribution. Mean levels (m) are represented by the thick black lines. The actual levels presented in the example trial are shown by the dashed lines. In the example, a level increment of 4 dB (DL L ) is added to the loud tones and a level increment of 6 dB (DL S ) is added to the soft tones, indicated by the double line. The task was to decide whether the overall loudness of the trial was loud (i.e., contained the increments) or soft (i.e., did not contain the increments). Panel B shows the masking task in which the level increments were presented only on the soft tones (i.e., targets), and the loud tones were to be ignored (i.e., maskers). The task was to decide whether the two soft tones (targets) contained the increments. Listeners were instructed to ignore the loud tones (maskers). doi:10.1371/journal.pone.0079830.g001 tones (i.e., temporal masking), for example due to adaptation in the auditory nerve [13]. This loss of information in the first stage of our simple model can be described as an increase in internal noise [14], and for the sake of simplicity we make no attempt to distinguish between sensory noise and memory noise [15]. The higher weights assigned to the loud tones could then be regarded as optimal because the information about the intensity of the loud tones available at the decision stage is more reliable than the information about the intensity of the soft tones [16]. Alternatively, the representation of soft-tone intensity available at the decision stage could be unaffected by the loud tones (i.e., no increase in internal noise) but the listeners could assign non-optimal decision weights. The latter explanation would be in line with the attention hypothesis put forward by Lutfi and Jesteadt [7]. To determine which of the two factors play a role in the multitone intensity-discrimination task, we estimated the internal noise in the different conditions via a novel type of maximum-likelihood analysis.
Given the background of previous research, we expected suboptimal decision strategies to play a more important role than increases in internal noise. First, in multitone discrimination tasks involving judgments of frequency rather than intensity, effects of relative tone level have been observed at ISIs up to 500 ms [8,9]. At this temporal separation, effects of forward masking on frequency discrimination should be negligible [17], although backward masking was reported to be present for intervals up to 240 ms [18,19]. Thus, previous studies suggest that the effect of relative tone level on the decision weights is at least partly caused by factors other than non-simultaneous masking. However, for intensity discrimination, effects of non-simultaneous maskers have been reported at considerably longer masker-target ISIs than in frequency discrimination [12], and backward maskers cause similar effects as forward maskers [20,21,22]. Specifically, in the stimuli presented by Lutfi and Jesteadt [7], the temporal separation between adjacent tones was just within the range for which strong effects of non-simultaneous masking were reported in an intensity discrimination task [12,20]. Thus, temporal masking may have played a role here.
However, this is not to imply that decisional factors play no role in experiments on intensity discrimination under non-simultaneous masking. Here, an important finding is the so-called mid-level hump [13]. The mid-level hump refers to the non-linear relationship between the level of the standard and the elevation of the intensity DL caused by an intense non-simultaneous masker (e.g. 90 dB SPL). The DL-elevation is more pronounced for a midlevel standard than for standards low or high in level. This result has been explained by several authors with a focus on reduced precision of the information about target intensity, either at the level of the auditory nerve or at later processing stages (for reviews see [11,23]). Thus, in terms of our simple model, the maskers were assumed to increase the internal noise present at stage 1. However, previous studies from our lab showed that the failure of selective attention to the targets is a useful framework for understanding the effects of non-simultaneous masking on intensity resolution [11,22,23,24,25]. This argues for suboptimal information integration in the decision stage. The hypothesis that the excessive weight placed on the loud tones in the multitone intensity discrimination task is largely due to non-optimal weighting strategies is in line with this concept.
Thus, in terms of both processing stages of our simple model, the performance in a multitone intensity discrimination task involving a sequence of tones alternating in level might be related to classical forward-masked intensity discrimination. For this reason, we expected a mid-level hump pattern [13] in the multitone paradigm, which would be a qualitative indication of the similarity between listeners' performance in the two tasks. We presented three different combinations of the level of the loud and soft tones. Our hypothesis was that if the mean level of the loud tones is fixed at 85 dB SPL, then the decision weights assigned of the soft tones will be even lower if they are presented at 55 dB SPL than if they are presented at 30 dB SPL.

Methods
To estimate the decision weights assigned to the different tones in the multitone stimulus, a one-interval, absolute identification task [26] was used. As depicted in Fig. 1, each trial included five tones alternating in mean level as in the experiments by Lutfi and Jesteadt [7,27]. The levels of all tones were randomly perturbed. The mean level of the loud tones was fixed at 85 dB SPL. The mean level of the soft tones was either 30 dB SPL, 55 dB SPL, or 80 dB SPL, resulting in masker-target level combinations like those in studies of the mid-level hump [13].
Each combination of the level of the soft and the loud tones was studied in two conditions. In one condition, a level increment was placed on all tones, just as in the experiments by Lutfi and Jesteadt [7]. Therefore, the task could be described as a global loudness judgment [28] where listeners judged the loudness of the stimulus in its entirety (i.e., over all five tones). In the other condition, the increment was placed only on the soft tones. As a consequence, like in a typical forward-masked intensity-discrimination task (e.g., [13]) the soft tones were ''targets'' and the remaining tones were task-irrelevant ''maskers''. In the latter condition, the listeners were instructed to judge the intensity of the soft tones and to ignore the loud tones.

Ethics Statement
The experiment was conducted according to the principles expressed in the Declaration of Helsinki. All listeners participated voluntarily after providing informed written consent, after the topic of the study and potential risks had been explained to them. They were uninformed about the experimental hypotheses. The study was approved by the ethical review board of the Department of Psychology at the Johannes Gutenberg-Universitä t Mainz.

Participants
Seven students at the Johannes Gutenberg-Universitä t Mainz participated in the experiment voluntarily (6 female, 1 male; aged 21-30 years). They either received partial course credit or were paid for their participation. All listeners reported normal hearing. Detection thresholds measured by Békésy tracking [29,30] with pulsed 270-ms tones including 10-ms cos 2 on-and off-ramps were better than 20 dB HL between 125 Hz and 8 kHz. Listeners were screened for acceptable detection thresholds under forward masking (see section Screening below). Furthermore, they were screened for showing a mid-level hump in an intensity-discrimination task with only one forward masker. It is known from previous experiments that for some subjects the masker-induced DL elevation is maximal at the lowest rather than at an intermediate target level (for a discussion see [23]). In terms of an ideal observer argument, for these listeners the weights should increase monotonically as a function of mean presentation level. As a consequence they cannot be used to test the hypothesis of this study pertaining to the mid-level hump.

Stimuli and Apparatus
The multitone stimulus consisted of 1000 Hz pure tones presented to the right ear, with a steady-state duration of 20 ms, gated on and off with 5-ms cosine-squared ramps. Each sinusoid started at zero phase. The multitone stimulus comprised a sequence of five tones separated by silent intervals of 100 ms, measured between zero-voltage points (see Fig. 1). The tones alternated in mean level, with the second and fourth tone having softer mean levels (''soft tones'') than the other three tones (''loud tones'').
In the overall-intensity task, a level increment -that is, a pure tone of the same frequency, duration and temporal envelope -was added in-phase to each of the five tones. In the masking task, increments were added only to the soft tones, rendering them ''targets'' because the loud tones (''maskers'') no longer provided any task-relevant information. The sound pressure levels of the loud tones were drawn independently from a uniform distribution spanning a symmetric range of q = 12 dB, centered symmetrically on the mean level m loud = 85 dB SPL. This distribution has a standard deviation of s = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1=12 : q 2 p = 3.46 dB [31]. The soft tones were drawn independently from a uniform distribution with mean m soft = 30 dB SPL, 55 dB SPL, or 80 dB SPL, again with a range of 12 dB.
To account for the near-miss to Weber's law (cf. [7,10]), that is, lower intensity resolution at low compared to high target levels, we adopted level increments of DL L = 4 dB for the loud tones and DL S = 6 dB for the soft tones. The inter-trial interval was 2000 ms, with the restriction that the next trial never started before the response and the feedback to the preceding trial had been given. A trial started with a 300-ms visual attention signal provided by two LEDs, followed by a silent interval of 500 ms, and then the onset of the first tone. The LEDs also visually marked the task-relevant target tones, and were used to provide trial-by-trial feedback.
The stimuli were generated digitally, played back via one channel of an RME ADI/S D/A converter (f s = 44.1 kHz, 24-bit resolution), attenuated by a TDT PA5 programmable attenuator, buffered by a TDT HB7 headphone buffer, and presented to the right ear via Sennheiser HDA 200 circumaural headphones calibrated according to IEC 318 [32]. The experiment was conducted in a double-walled sound-insulated chamber.

Procedure and Design
Multitone intensity discrimination with alternating loud and soft tones. For the multitone sequence of five tones alternating in level, two different tasks were presented. In the overall-intensity task, level increments DL were added to each of the five tones; in the masking task they were added only to the soft tones. Each trial had an a priori probability of 50% to contain the increments. Listeners were instructed to indicate if the just presented multitone stimulus had been soft or loud compared to previous trials in a given block. Thus, an absolute identification task [26] with a virtual standard (e.g., [33]) was used. One could also describe the task as a sample discrimination task [1,34,35] where the listeners decided whether the tone levels had been drawn from the ''loud distribution'' (i.e., increments present) or the ''soft distribution'' (increments absent). In the overall-intensity task, to emphasize the necessity to judge the overall loudness of the trial, LED 1 was switched at the onset of the first tone, and was switched off at the offset of the fifth tone. In the masking task, the task was to judge the intensity of only the two soft tones (''targets''), and to ignore the loud tones (''maskers''). LED 1 was switched on in synchrony with the targets. Visual trial-by-trial feedback was provided. Subjects were informed that due to the random level perturbations the feedback could be counterintuitive on some trials.
Apart from the decision weights, we were interested in the sensitivity for detecting the increment. If binary responses (e.g., ''soft'' versus ''loud'') are obtained in a one-interval task, then d9 can only be estimated from the hit and false-alarm rates under the assumption of equal-variance Gaussian distributions for ''signal'' and ''noise'' on the internal continuum (cf. [36,37]). Therefore, a rating response was used and the area under the ROC curve (AUC) was computed, which is a valid measure for sensitivity that does not require strong assumptions about the internal distributions (e.g., [38,39]). AUC corresponds to the proportion of correct responses obtained with the same stimuli in a forced-choice task [36,40,41,42]. The rating scale comprised four ordered response categories (''soft -rather sure'', ''soft -rather unsure'', ''loudrather unsure'', ''loud -rather sure''). Sensitivity in terms of the AUC was computed for each experimental block (105 trials, first five trials excluded from the analysis) and then transformed to d9, using the relation d9 2I = ffiffi ffi 2 p z(AUC), where z(AUC) is the standard normal deviate corresponding to the proportion AUC (cf. [38,40,41,43]). The advantage of using d9 rather than AUC is that d9 can be viewed as a linearization of the binomial quantity AUC, and d9 is often found to be linearly related to stimulus magnitude (e.g., [44,45]). A linearization is also desirable because repeated-measures ANOVAs are sensitive to departures from normality [46]. For each combination of task (overall-intensity task, masking task) and mean level of the soft tones (m soft = 30, 55, 80 dB SPL), five blocks of 105 trials and thus a total of 525 trials were presented. The first five trials of each presentation block were removed from further calculations, reflecting the necessity to first establish a decision criterion at the beginning of an experiment block. Therefore, 500 trials per listener and condition entered the analysis. Only one of the two tasks was presented during one session, in order to avoid confusion between the two different tasks. Only one value of m soft was presented in each block. The order of blocks was randomized in each session.
To obtain information about intensity resolution for only the soft or only the loud tones, in four additional conditions only the two soft tones (30, 55, or 80 dB SPL), or only the three loud tones (85 dB SPL) were presented. These stimuli were generated simply by deleting the loud or the soft tones, respectively, from the fivetone sequences. Thus, in the soft-tones-only and loud-tones-only tasks, all tones were presented at the same temporal position as in the five-tone sequences. The same random level perturbations and level increments as in the five-tone sequences were used. For each of the four different mean levels, 420 trials were collected (also omitting the first five trials per block, i.e. 400 trials were entered into the data analyses).
Estimation of decision weights. The individual trial-bytrial data obtained for each combination of task (overall-intensity, masking, soft-tones/loud-tones-only) and mean level of the soft tones were analyzed to estimate the decision weights. Multiple logistic regression (PROC LOGISTIC of SAS 9.2) was used to estimate the influence of the level of each individual tone on the response of the listener (cf. [2,27,28,47]). The rating of perceived target loudness (''soft -rather sure'', ''soft -rather unsure'', ''loud -rather unsure'', and ''loud -rather sure'') served as the dependent variable and the two to five tone levels served as predictors, which were entered simultaneously. The regression coefficients were taken as the weight estimates. For a given tone, a regression coefficient equal to zero means that the level of the tone had no influence at all on the decision to judge the target tones as being either soft or loud. A regression coefficient greater than zero means that the probability of responding that the target tones of the presented sequence were perceived as loud rather than soft increased with the level of the given tone. A regression coefficient less than zero indicates the opposite relation between the level of one tone and the probability of responding that the targeted tones were perceived as loud rather than soft.
This analysis is based on a decision model which assumes that listeners use a decision variable.
where L i is the sound pressure level of tone i, k is number of tones, L is the vector of tone levels, w i is the perceptual weight assigned to tone i, and c j is a constant representing the decision criterion for the j th of the four ordered response categories (cf. [6,28,48]). In other words, D j (L) is a weighted average of the different sound pressure levels of tones. As we have a four-category response variable Y we assumed a proportional-odds model [49] according to which where J is the number of ordered response categories. This model applies simultaneously to all J21 cumulative probabilities, and it assumes an identical effect of the predictors for each cumulative probability [48]. Due to the difference in mean level between ''increment present'' and ''increment absent'' trials, the tone levels were correlated. Although multiple logistic regression should correct for these correlations, to avoid any problems with multicollinearity separate logistic regression analyses were conducted for the trials containing increments vs. no added increments (cf. [6]). Thus, a separate logistic regression model was fitted for each combination of subject, mean level of soft (loud) tones, task and increment (present/absent). As we were interested in the relative contributions of the different tones levels to the decision rather than in the absolute magnitude of the regression coefficients, the weights w i were normalized such that the sum of the absolute values of the weights was unity (see [5]), resulting in a set of relative temporal weights for each fitted model.
Detection thresholds. Detection thresholds under masking and in quiet were obtained in a 2I, 2AFC task. The second and fourth tones of the multitone sequence (i.e., the soft tones) were presented in one of the two observation intervals. The target tones occurred with equal probability in the first or second interval. In the detection task under masking, three maskers corresponding to the first, third and fifth tone of the multitone sequence were presented at the maximal masker level from the discrimination task (95 dB SPL). This level corresponds to a mean tone level of 85 dB SPL, a level perturbation of +6 dB, and a 4 dB level increment. No random level perturbation was applied, either to the maskers or to the signal. The two observation intervals were separated by a silent interval of 600 ms. An adaptive procedure with a 3-down, 1-up tracking rule was used [50]. Initially, the signal level was 40 dB SPL. The step size was 8 dB until the fourth reversal, and 2 dB for the remaining eight reversals. A track ended when 12 reversals had been obtained or when 70 trials had been presented, whichever occurred first. Visual trial-by-trial feedback was provided. The threshold level was computed as the arithmetic mean of the signal levels at the final eight reversals. A track was discarded if the standard deviation of the latter signal levels was greater than 6 dB. For both detection under masking and detection in quiet, four adaptive blocks were obtained.
Screening. To screen for an acceptable threshold under masking, forward-masked detection thresholds were measured prior to the main experiment. Again, the masker level was 95 dB SPL. The criterion for participation in the main experiment was a threshold of 19 dB SPL or lower. Because the minimum target level was 24 dB SPL in the main experiment, this ensured that the target tones were at least 5 dB above the detection threshold in all trials of the main experiment. Three adaptive blocks were obtained, using the procedure described above.
To screen for a mid-level hump in a typical forward-masked intensity-discrimination task, prior to the main experiment intensity DLs were measured, using the same one-interval task as in the main experiment but presenting only one forward masker and one target. The same frequency, tone duration, and maskertarget ISI as in the multitone task were used. The same random level perturbations as in the multitone tasks were applied to the target. No random level perturbation was applied to the masker. A fixed level increment was added to the target, no increment was added on the masker. The listeners were instructed to ignore the masker and to judge whether the target was loud or soft, using the rating scale introduced above. Visual trial-by-trial feedback was provided. A forward masker with a level of 85 dB SPL combined with either a mean target level of 30 or 55 dB SPL was presented. Although the definition of the mid-level hump implies both a lower and upper comparison, we did not include a high-level target to save experimentation time because without exception all previous studies showed very small DL-elevations at high target levels. Cumulative-normal psychometric functions relating the (randomly varying) sound pressure level of the target to the proportion of trials on which the listener responded that the loud target had been presented [51] were fitted for each block, using a maximumlikelihood method. The DL was defined as half the difference between the 75%-point and the 25%-point on the psychometric function.
For each listener and for each mean target level 420 trials were obtained. Again, the first five trials of each block were excluded from further analyses, i.e. 400 trials were used. Listeners were included into the main experiment if they showed an at least 4 dB higher DL for the mean target level of 55 dB SPL than for the mean target level of 30 dB SPL. This resulted in the exclusion of 4 out of a total of 11 listeners. Averaged across the listeners selected for the experiment, the difference between the DL at the 55 dB SPL target level (M = 25.67 dB, SD = 8.70 dB) and the DL at the 30 dB SPL target level (M = 12.86 dB, SD = 3.94 dB) was 12.81 dB (SD = 5.9 dB), showing a substantial mid-level hump.
Sessions. Each listener participated in a total of eleven experimental sessions, each with a duration of approximately 55 minutes. The first two sessions served as screening sessions for acceptable thresholds under masking as well as for showing a midlevel hump pattern in the individual data under forward masking. Audiometric thresholds were measured in session 1. In session 3, practice blocks for all tasks were presented. In the remaining eight sessions constituting the main experiment, two alternating configurations were presented. The even-numbered sessions presented four blocks of the masking task and two blocks of the soft2/loud-tones-only conditions. Each block contained 105 trials. The order of tasks was randomly selected. The four blocks of the masking condition were randomly selected from the set of three mean levels of the soft tones. The two blocks of the soft2/loudtones-only conditions were randomly selected from the four different conditions (soft-tones-only with m soft = 30, 55, 80 dB SPL, loud-tones-only). The odd-numbered sessions presented four blocks of the overall-intensity task and two blocks of the soft2/ loud-tones-only condition. Again, the task order was randomly selected. Additionally, at the end of sessions 4 to 12, one block of the detection task (in quiet or under forward masking) was presented.

Detection Thresholds
The average detection threshold (corresponding to 79.4% correct) under masking (M = 12.23 dB SPL, SD = 4.25 dB; maximum individual value 18.3 dB SPL) was significantly higher than in quiet (M = 8.58 dB SPL, SD = 2.37 dB), t(6) = 4.07, p,.01, Cohen's [52] d z = 1.54. As the minimum level of the tones in the intensity-discrimination task was 24 dB SPL, the target tones were presented more than 5 dB above threshold on all trials.

Decision Weights
Before reporting the statistical analyses, we describe the pattern of average normalized decision weights for the overall-intensity task and the masking task.
Panel A of Fig. 2 depicts the decision weights for the overallintensity task. Compatible with the results by Lutfi and Jesteadt [7], the weights assigned to the loud tones were much higher than the weights assigned to the soft tones at the two lower mean levels of the soft tones (m Soft ). As expected, at m Soft = 80 dB SPL the weights were more equally distributed. However, at m Soft = 30 dB SPL the weights assigned to the soft tones were not higher than at m Soft = 55 dB SPL. Thus, no mid-level hump-like weighting pattern existed for the overall-intensity task.
The average normalized weights in the masking task are shown in Panel B of Fig. 2. For all mean levels of the soft tones, the to-beignored loud tones received higher weights, but especially so at the intermediate level of the soft tones. Thus, the data of the masking task exhibited the expected mid-level hump type of weighting pattern. The 95% confidence intervals (CIs) indicate higher interindividual differences in the masking task than in the overallintensity task.
The normalized decision weights were analyzed with a repeated-measures analysis of variance (rmANOVA) with the within-subjects factors tone number (tone 1 to tone 5), mean level of the soft tones (m soft ), task, and presence of the increment. A univariate approach with Huynh-Feldt [53] correction for the degrees of freedom (df) was used, which shows good control of the Type I error rate for small sample sizes [46]. The df correction factorẽ e is reported. Partial g 2 is reported as a measure of association strength [54]. The same type of rmANOVA was used for the analysis of sensitivity and internal noise (see next two sections Sensitivity and Estimates of internal noise). We used an a-level of.05 for all analyses. Post-hoc pairwise comparisons were conducted using separate paired-samples t-tests [55] and Hochberg's [56] sequentially acceptive step-up Bonferroni procedure which controls the familywise Type I error rate.
There was a significant effect of tone number, F(4, 24) = 13.66, p,.001, g 2 p = .695,ẽ e = .566, reflecting the weight differences between loud and soft tones in both tasks. The more uniform weighting profile in the masking task compared to the overallintensity task resulted in a significant interaction between tone number and task, F(4, 24) = 6.57, p = .003, g 2 p = .52,ẽ e = .768. The ANOVA also showed a significant interaction between tone number and m soft , F(8, 48) = 5.02, p,.001, g 2 p = .46,ẽ e = .958, because averaged across both tasks the differences in weights between loud and soft tones were especially pronounced for the sequences containing the 55 dB SPL soft tones and less evident for the sequences containing the 80 dB SPL soft tones. No further main effects or interactions were significant (all p-values ..136).
To further analyze the tone number6m soft interaction, two separate three-factorial rmANOVAs (tone number6m soft 6presence of increment) were computed, one for each task. (a) For the overall-intensity task, both the effect of m soft , F(2, 12) = 10.41, p = .006, g 2 p = .63,ẽ e = .772, and the effect of tone number, F(4, 24) = 58.74, p,.001, g 2 p = .91,ẽ e = .371, were significant. Also, the m soft 6tone number interaction was significant, F(8, 48) = 12.39, p,.001, g 2 p = .67,ẽ e = .719. These results reflect the large differences in weights between soft and loud tones in the overallintensity task and that these differences are smaller for the sequences containing the 80 dB SPL soft tones. Additionally, the interaction between m soft and presence of increment was significant, F(2, 12) = 9.31, p = .011. Inspection of the data showed that this effect was caused by spurious negative weights, which resulted in a difference in average weight because the normalization was computed on the basis of the absolute values of the weights.
(b) For the masking task the effect of tone number was marginally significant, F(4, 24) = 2.74, p = .072, g 2 p = .31,ẽ e = .762, as well as the interaction between m soft and tone number, F(8, 48) = 1.96, p = .097, g 2 p = .25,ẽ e = .756. Thus, the differences in weight between soft and loud tones were somewhat weaker than in the overall-intensity task. The non-significant effect of m soft, F(2, 12) = 0.16, p = .784, reflected the similar weights for the sequences with 30 and 80 dB SPL soft tones and the generally more uniform weighting profile in this task (as well as the large inter-individual differences).
For additional exploration of the descriptive mid-level hump pattern in the masking task, we averaged the weights across the three loud and the two soft tones. A three-factorial rmANOVA (tone type (loud/soft)6m soft 6presence of increment) showed a marginally significant effect of tone type, F(1, 6) = 5.75, p = .052, g 2 p = .49, reflecting the stronger weights on the loud tones independent of m soft . A significant interaction between tone type and m soft , F(1, 12) = 4.53, p = .036, g 2 p = .43,ẽ e = .969, confirmed the expected larger difference in weights between soft and loud tones for the sequences containing the 55 dB SPL soft tones (M D = 0.19, SD = 0.13) and the more uniform pattern of weights (i.e., smaller differences between weights on soft and loud tones) for the sequences containing the 30 dB SPL (M D = 0.05, SD = 0.18) and 80 dB SPL soft tones (M D = 0.07, SD = 0.08). For each mean level of the soft tones, post-hoc paired-samples t-tests were computed between the averaged weights for the soft tones and the averaged weights for the loud tones, resulting in three comparisons. After correction for multiple testing only the comparison at m soft = 55 dB SPL was significant.
For the loud tones in the overall-intensity task as well as for loud-tones-only condition, a decrease of weighting values from the first to the last tone is visible in Panel A and Panel D of Fig. 2. This primacy-effect-like pattern (e.g., [57]), is compatible with previous results from multitone intensity-discrimination tasks (e.g., [2,5,28,58,59,60,61]). To investigate this temporal weighting pattern, we analyzed the weights assigned to the loud tones in the overall-intensity task and in the loud-tones-only condition. The three weights in the overall-intensity task were normalized such that the sum of the absolute values was unity. A 364 rmANOVA with the factors loud tone number (first, third and fifth tone in the sequence) and task (overall-intensity task at m soft = 30, 55, and 80 dB SPL, and loud-tones-only condition) showed a significant effect of tone number, F(2, 12) = 8.34, p = .019, g 2 p = .58,ẽ e = .61, reflecting the monotonic decrease of the weights from the first, over the third, to the fifth tone. Due to the normalization there was no main effect of task. No significant tone number6task interaction was found, F(6, 36) = 1.52, p = .232. Fig. 3 depicts the sensitivity in terms of d9 2I (converted from AUC; see Method section) for each combination of task and m soft . The sensitivity in the loud-tones-only task was slightly lower than in the soft-tones-only conditions. Three paired-samples t-tests (with correction for multiple testing) were computed to analyze the differences between the averaged d9 in the loud-tones-only with the soft-tones-only task for m soft = 30, 55, and 80 dB SPL. For m soft = 30 and 55 dB SPL the differences were not significant. The sensitivity in the loud-tones-only task was significantly lower than in the 80 dB SPL soft-tones-only task. Thus, our use of different increments for the loud and the soft tones was successful in ensuring that the sensitivity for detecting the level increment on the soft tones was at least as high as for the loud tones. The sensitivity in the overall-intensity task was similar to the soft-tonesonly task and increased with the mean level of the soft tones. For the masking task, the sensitivity was particularly low for the sequences containing the 55 dB SPL mean level of the soft tones, compatible with a mid-level hump pattern. In general, the sensitivity was low in this task (d9 ,0.40).
To further analyze the effect of m soft on sensitivity, three separate rmANOVAs were computed. In the masking task, the data showed a mid-level hump pattern, although the effect of m soft was only marginally significant, F(2, 12) = 3.69, p = .063, g 2 p = .38, e e = .91. For the overall-intensity task the effect of m soft was significant, F(2, 12) = 27.54, p,.001, g 2 p = .82,ẽ e = .75, but there was no evidence for a mid-level hump pattern of sensitivity. For the soft-tones-only task the effect of m soft was marginally significant, F(2, 12) = 3.45, p = .066, g 2 p = .37, and the higher sensitivity at higher levels was compatible with the near-miss to Weber's law.

Estimates of Internal Noise
Which factors limit observers' performance in the multitone intensity-discrimination task? As outlined in the Introduction, first, the information about intensity available at the decision stage might be inexact, for example because the loud tones reduce the precision of the information about the intensity of soft tones, which could be modeled as an increase in internal noise [14]. Second, it might be the case that a precise representation of tone intensity is available at the decision stage, but that this information is not used in an optimal fashion [14]. For example, in the masking task the decision might be influenced by task-irrelevant information about masker intensity (as observed by [23]). A reduction in sensitivity could be caused by either of these factors alone, or by a combination of both. Therefore, the sensitivity-based measures like the DL-elevation do not allow for deciding whether presentation of intense tones in a multitone sequence increases internal noise, or results in a suboptimal integration of information, or both. In contrast, with the ''molecular psychophysics'' methods used here, some insight can be gained into these two potential effects of non-simultaneous masking.
We used a maximum-likelihood approach to estimate the internal noise effective in the different experimental conditions. To illustrate this approach, imagine a sequence containing only two tones. On the basis of the type of decision variable assumed in Eq.
(1), and compatible with the usual signal detection model, we assume the decision variable (i.e., value on the internal continuum) to be given by where L 1 is the randomly varying sound pressure level of tone 1 on a given trial (including the level increment on increment-present trials), L 2 is the sound pressure level of tone 2, w 1 and w 2 is the decision weight assigned to the first and second tone, respectively, and e 1 and e 2 are random variables representing the internal noise effective for tone 1 and 2. The internal noise components e 1 and e 2 are assumed to be independent of each other and normally distributed with mean 0 and standard deviation s I1 and s I2 , respectively. The probability of selecting a rating category j = 1…4 (coded as 1 = ''soft -rather sure'', 2 = ''soft -rather unsure'', 3 = ''loudrather unsure'', and 4 = ''loud -rather sure'') is where CDF[N(m,s),c] is the cumulative density function of a normal distribution with mean m and standard deviation s, evaluated at the point c, c 0 = 2', and c 4 = +'. This observer model can be used to obtain maximum likelihood estimates of the weights, and of the total internal noise standard deviation. For any pair of levels (L 1 and L 2 ), the likelihood of the observed rating response is given by Eq. (4). Assuming independence between trials, the total likelihood is the product of the likelihoods of the individual trials. We minimized the negative log likelihood numerically using the Mathematica 9.0 function NMinimize[]. The weights can only be estimated up to a multiplicative constant [6], which presents no problem because we are only interested in the relative weights. Therefore, without loss of generality we set w 1 = 1 or w 1 = 21, depending on whether the probability of selecting a higher rating category increases or decreases with the level of tone 1. As noted by Berg [6], the relative weights are independent of additive internal noise. In Eq. (4), the two internal noise variances only appear in a common term representing the standard deviation of the normally distributed decision variable. Therefore, increasing for example s I1 will flatten out both ''conditional on single stimulus'' (COSS) [6] psychometric functions that describe the relation between for example L 1 and the probability of a given rating response, regardless of the level of the second tone (L 2 ). However, the increase in s I1 will not affect the ratio between the two estimated weights, w 1 /w 2 .
For the soft-tones-only condition, the internal noise can now be estimated from the trial-by-trial data by assuming s I1 = s I2 = s Isoft , i.e., that each stimulus component contributes the same amount of internal noise. For the loud-tones-only condition, a third term representing tone 3 is added to the equations (3) and (4), and it is assumed that s I1 = s I2 = s I3 = s Iloud .
For the five-tone sequences (overall-intensity task and masking task) the mean and standard deviation of the cumulative-normal psychometric function relating the decision variable and the response are m 5T = w 1 L 1 + w 2 L 2 + w 3 L 3 + w 4 L 4 + w 5 L 5 and Isoft5T q , assuming that the internal noise does not differ between the three loud tones (s I1 = s I3 = s I5 = s Iloud5T ) or the two soft tones (s I2 = s I3 = s I-soft5T ). The response probabilities are again given by Unfortunately, it is not possible to obtain separate estimates of s Iloud5T and s Isoft5T , because these parameters appear only in a common term. Yet, we can use the ML-estimate of s 5T obtained from Eq. (5) for an approximate test of whether the internal noise standard deviation was increased in the five tone sequences, compared to sequences presenting only the loud or only the soft tones. To this end, we compare the estimate of s 5T to , where w 1 through w 5 are the weights estimated in the five-tone sequence, and s 2 Isoft and s 2 Iloud is the internal noise variance estimated for the soft-and loud-tones-only condition, respectively. If we assume that combining the loud and soft tones into a single, alternating sequence can result in an increase, but not in a reduction of the internal noise effective for each of the two tones types (soft and loud), then if s 5T = s I isolated there was no increase in internal noise, while if s 5T .s I isolated then at least one of the internal noises was increased in the alternating sequence. This rationale corresponds to the observer efficiency [62] measure g noise proposed by Berg [8]. The latter measure is defined as g noise = (d9 obs /d9 wgt ) 2 , where d9 wgt is the sensitivity that would result if the internal noises in the five-tone sequence were identical to the internal noises in the sequences containing only soft or only loud tones, and if the listener applied the decision weights actually observed for the five-tone task. If the observed sensitivity in the five tone task, d9 obs , is smaller than d9 wgt (i.e., g noise ,1), then there was an increase in internal noise.For large samples (i.e., high number of data points, as in the present experiment), maximum likelihood estimates are approximately normal [63], and their asymptotic variance-covariance matrix can be calculated in terms of the Fisher information, by taking the inverse of the Hessian matrix [63]. It is therefore possible to test H 0 : s 5T = s I isolated on an individual basis. The ML analysis of the data from the five-tone sequences provides estimates of the weights, their standard errors (SE), and their covariances, as well as an estimate of s 5T and its SE. ML analyses of the data from the soft2/loud-tones-only conditions provide estimates and SEs of s Iloud and s Isoft . We were not able to find an analytic solution for the standard error of s I isolated . We therefore used a Monte Carlo approach, where we simulated 200,000 values of s I isolated , with w 1 through w 5 , s Isoft , and s Isoft drawn from normal distributions, with means, standard deviations and covariances as estimated via the ML analyses. The mean and standard deviation of the simulated samples of s I isolated were taken as estimates of the population mean and the standard deviation of s I isolated . The simulated weights were multinormally distributed, with mean vector {w 1 , w 2 , w 3 , w 4 , w 5 } and covariance matrix as estimated by the ML analysis of the five-tone conditions. The simulated ''isolated'' internal noises s Isoft and s Isoft were independent and normally distributed with means and standard deviations as estimated by the ML analysis of the soft2/loudtones-only conditions. For a given subject and condition, the test statistic q , can be referred to a standard normal distribution, providing a test of H 0 : s I isolated = s 5T against H 1 : s I isolated ? s 5T . The relative weights obtained by the described maximum likelihood analysis were almost identical to the relative weights provided by the multiple regression analyses presented in section Decision weights (R 2 = .99), compatible with previous results [64,65] and confirming the validity of our analyses. The individual estimates of s 5T and s I isolated for the overall-intensity task are displayed in Fig. 4. At an a-level of.05 (two-tailed), there was no significant difference between s 5T and s I isolated , except for one case where s 5T was even lower than s I isolated . Thus, the data indicate that the combination of the loud and soft tones into an alternating sequence did not increase the internal noise.
Given that there was no evidence for an increase in internal noise, it is interesting to quantify the loss in sensitivity caused by non-optimal decision weights, in particular the low weights on the soft tones. To this end, we estimated the sensitivity that a ''limited ideal observer'' [66] would obtain using the optimal set of decision weights in the overall-intensity task in the absence of any masking effects (i.e., no increase in internal noise compared to sequences containing only loud or only soft tones, as demonstrated by the above analysis). By definition, d9 is equal to the difference between the expected value of the decision variable on increment-present and increment-absent trials, divided by the common standard deviation [36]. For example, in the two-tones sequences considered above in Eq. (3), the decision variable on increment-present trials is given by where L 1P is the randomly varying ''pedestal'' level of the first tone presented on the given trial, DL 1 is the level increment imposed on the first tone, e 1 is a random variable representing internal noise, and w 1 is the decision weight assigned to the first tone. The second term on the right side represents the same parameters for the second tone. Due to the random level perturbations (representing ''external noise''), L 1P and L 2P are independent random variables with mean m 1 and m 2 , respectively, and standard deviation s E1 = s E2 = ffiffiffiffiffiffiffiffiffiffiffi 1=12 : p q 2 , where q = 12 dB is the range of the uniform distribution L 1P and L 2P were sampled from. As a reminder, the internal noise components e 1 and e 2 are assumed to be independent of each other and independent of the external noise, normally distributed with mean 0 and standard deviation s I1 and s I2 , respectively.
For the increment-absent trials, the decision variable corresponds to By definition, d9 is equal to the mean of X incr 2 X noIncr divided by the common standard deviation. Thus, For our setting, the general formula for i = 1…k loud and j = 1…l soft tones is where Sw Li and Sw Sj denotes the sum of the weights assigned to the k loud and l soft tones, respectively, and DL L and DL S are the level increments imposed on the two types of tones. On the basis of the internal noise variances estimated from the data obtained in the soft-and loud-tones-only conditions, we computed the sensitivity d9 OWNM , where the index stands for 'optimal weights, no masking'. This is the sensitivity that would result for the fivetone sequences if (a) there was no increase in internal noise compared to the soft-tones-only or loud-tones-only condition, and (b) the listener applied the optimal decision weights. In the overallintensity task, the listener can use information from five tones. These five observations can be assumed to be independent, because the ''external noise'' (level variation) applied to each tone was independent, we assume the internal noise components to be mutually independent and independent of the external noise, and we assume that there are no masking effects. Therefore, if the listener optimally combines the information from the five tones (integration model; [16]), then the sensitivity is where d9 i is the sensitivity for tone i, which is given by the difference between increment-present and increment-absent trials divided by the standard deviation of this difference. The SD is a function of the external noise (s E = ffiffiffiffiffiffiffiffiffiffiffiffi q 2 =12 p ) and the internal noise (s Ii ). Because DL i = DL L and s Ii = s ILoud for tones 1, 3 and 5, and DL i = DL S and s Ii = s ISoft for tones 2 and 4, The first term under the root is the d9 that can be achieved using information from only the loud tones, and the second term is d9 when using only the soft tones. This optimal sensitivity is obtained if each loud tone receives identical weight (w 1 = w 3 = w 5 = w L ), each soft tone receives identical weight (w 2 = w 4 = w S ), and.
The interpretation of Eq. (12) is simple: the optimum ratio between the weight assigned to the soft tones and the weight assigned to the loud tones is the reliability (level increment divided by the sum of the internal and external noise variance) of the soft tones, divided by the reliability of the loud tones.  The average values of d9 OWNM are displayed in Table 1, together with the observed sensitivity d9 obs . The observer efficiency can be defined as g = (d9 obs /d9 OWNM ) 2 [8,62]. The important difference between our analysis and the efficiency measures proposed by Berg [8] is that he used the sensitivity for an ideal observer with zero internal noise and ideal weights (termed d9 ideal ) as the reference sensitivity relative to which the observer efficiency was computed. Our reference sensitivity, d9 OWNM , differs from this concept because it encompasses the internal noise present in the soft2/loud-tones-only condition. We selected this reference sensitivity because we were interested in changes in internal noise in the alternating-level five-tone sequences relative to the conditions presenting only the soft or only the loud tones, rather than in a comparison to an ideal observer characterized by the (unrealistic) complete absence of internal noise. A similar analysis was used by Alexander and Lutfi [66], who for a multitone frequency discrimination task computed the reference sensitivity on the basis of the frequency resolution for individual frequency components and termed this reference sensitivity d9 PLIO for ''peripherally limited ideal observer''.
Because in the overall-intensity task there was no evidence for increased internal noise, values of g lower than 1.0 represent a loss in sensitivity due to non-optimal weights, and for the overallintensity task g was significantly smaller than 1.0 at all mean levels of the soft tones. Thus, in the overall-intensity task the internal noise was not higher than for the soft tones or loud tones presented alone, but the listeners did not achieve the maximally possible sensitivity because they used suboptimal decision weights. The efficiency was higher at m soft = 80 dB SPL than at the two lower levels of the soft tones. An rmANOVA on g showed a significant effect of m s , F(2,12) = 4.85, p = .035, g 2 p = .45,ẽ e = .88. This pattern is similar to the observed sensitivity (see Fig. 3).
For the masking task, the individual estimates of s 5T and s I isolated are displayed in Fig. 5. The confidence intervals show that in several cases it was not possible to obtain precise MLestimates of s 5T . In one case (listener 6 at m s = 55 dB SPL) no acceptable model fit could be obtained. Unlike in the overallintensity task, s 5T tended to be higher than s I isolated . The difference between the two internal noise measures was significant (p,.05) for four out of six listeners at the lowest, three listeners at the intermediate and one listener at the highest mean level of the soft tones. Thus, for the masking task an increase in internal noise in the five-tone sequences cannot be precluded. Because it is not possible to separately estimate s Iloud5T and s Isoft5T , the increase in internal noise could be due to a reduced precision of the intensity information for both the loud and soft tones, although it seems likely that the presence of the loud tones increased the internal noise effective for the soft tones due to non-simultaneous masking [11]. Apart from the evidence for increased internal noise, the non-zero weights observed on the loud tones in the masking task (see Fig. 2) clearly represent non-optimal weights because no level increment was presented on the loud tones, so that they conveyed no information concerning the correct response. However, it is unfortunately not possible to use the weighting efficiency g wgt = (d9 wgt / d9 OWNM ) 2 that Berg [8] proposed as a measure of the loss in efficiency caused by suboptimal weights. The measure d9 wgt is defined as the sensitivity that would result if (a) the listener applied the decision weights actually observed for the masking task and if (b) there was no increase in internal noise. Because unlike for the overall-intensity task we cannot rule out an increase in internal noise, separate estimates of s Iloud5T and s Isoft5T would again be necessary for quantifying the additional loss in sensitivity in the masking task caused by non-optimal weights. As shown in Table 1, the observer efficiency (g) in the masking task was very low and significantly smaller than 1.0 for all values of m soft .

Discussion
The present experiment studied the relation between forwardmasked intensity discrimination and multitone intensity discrimination. Previous data showed a strong impairment in intensity resolution caused by non-simultaneous maskers (for a review see [11]), and a systematic bias towards low decision weights assigned to soft tones in a multitone intensity-discrimination task presenting an alternating sequence of loud and soft tones [7]. To investigate the relationship between the two tasks we presented a five-tone sequence containing alternating loud and soft tones. The level of the loud tones was fixed at 85 dB SPL, and the level of the soft tones was varied (30,55, and 80 dB SPL), corresponding to the level combinations representing the mid-level hump in forwardmasked intensity discrimination. The task was to detect a level increment on either all tones of the multitone stimulus (overallintensity task), or only on the soft tones while ignoring the loud tones (masking task). The latter condition should allow for a comparison with a typical forward-masked intensity-discrimination task.
As expected, listeners did not assign uniform temporal weights. In the overall-intensity task, the weights on the loud tones exhibited a primacy effect, i.e., a higher weight was assigned to the first tone than to the following loud tones. The same pattern was present in the soft2/loud-tones-only conditions. This is compatible with previous reports of temporal weights in intensitydiscrimination tasks [5,28,59,60,61].
Concerning the decision strategy in the sequences containing both soft and loud tones, we expected two main results. First, in the overall-intensity task we anticipated smaller weights on soft than on loud tones. In the masking task, we expected non-zero weights on the to-be-ignored loud tones, and thus a systematic influence of the uninformative masker-intensities on the decision [23]. Additionally, these effects were expected to be particularly strong at the intermediate level of the soft tones (55 dB SPL). This hypothesis is compatible with the conception that the difficulty in selectively attending to the (soft) target tones in a classical forwardmasked intensity-discrimination task is a major factor for the deterioration of intensity resolution [22,23,24], which is particularly strong at mid-levels. Second, we were interested in whether listeners would only show non-optimal decision weights or if there would be evidence for an increase in internal noise in the five-tone sequences.
As expected, the weights assigned to the soft tones were lower than the weights assigned to the loud tones in the overall-intensity task. The weights on the soft tones were similar and near zero for m soft = 30 and 55 dB SPL, and significantly higher at m soft = 80 dB SPL, compatible with results by Lutfi and Jesteadt [7]. Thus, the weighting patterns did not exhibit a clear mid-level hump like pattern. In the masking task, the loud tones (maskers) received non-zero weights, confirming our expectation that listeners would fail to ignore the maskers. This pattern was most pronounced at the intermediate level of the soft tones, where the weights on the soft tones (targets) were even lower than the weights on the loud tones (maskers), compatible with a mid-level hump. The difference between the weighting patterns observed at the three levels of the soft tones was not significant, however. It should be noted that in the masking task there was also only weak evidence for lower sensitivity at the intermediate than at the low or high target level. Thus, in terms of intensity resolution, the mid-level hump pattern was weaker than we had expected.
What can be concluded about the origin of the reduced sensitivity in the alternating-level sequences? We first discuss the overall-intensity task, where level increments were presented on all of the five tones. Despite the fact that the higher level increments presented on the soft tones slightly overcompensated for the nearmiss to Weber's law (see section Sensitivity), so that the soft tones provided at least the same amount of information as the loud tones, at the two lower soft-tone levels only the loud tones received considerable decision weights while the soft tones were virtually ignored. Regarding the decision about the intensity of the complete sequence, the listener would make optimal use of available information by using the level information from all of the five tones. However, as Figure 3 shows, almost ignoring the information provided by the soft tones still resulted in an acceptable level of performance. Thus, one could argue that the listeners might have adopted an ''economical'' strategy by attending only to the loud tones. Alternatively, it might have been the case that the loud tones acted as forward maskers, rendering the information about the level of the soft tones available at the decision stage imprecise (in the sense of increased internal noise), and thus uninformative. The molecular analyses providing separate estimates of decision weights and internal noise allow rejection of the latter hypothesis, however. In the overallintensity task we found no evidence for an increase in total internal noise compared to the loud-or soft-tones-only conditions. Therefore, the observed weighting pattern in the overall-intensity task cannot be attributed to impaired information about soft tone intensity. Instead, the data are compatible with the concept that the information about the intensity of the soft tones is not used optimally due to a failure of selective attention to the soft tones (cf. [11,22,23,24]), as evidenced by the estimates of observer efficiency (Table 1). Our data are also compatible with the observation of effects of relative tone level at long ISIs in sample discrimination of frequency [9] where ''masking'' effects are very unlikely, and therefore the higher reliance on information from the louder stimulus components can also be viewed as a suboptimal weighting strategy.
For the masking task where only the soft tones provided information concerning the correct response, the relative weights on the soft tones were higher than in the overall-intensity task, except at the intermediate target level (55 dB SPL). Thus, the listeners were in principle able to adjust their decision weights based on the task requirements. However, the decision was systematically influenced by the uninformative intensity of the loud tones. This result is directly compatible with the hypothesis that a failure of selective attention to the targets in an intensitydiscrimination task under non-simultaneous masking is a major factor in the sometimes dramatic reduction in intensity resolution [22,23,24]. In a classical intensity-discrimination task with only one target and one forward-masker, Oberfeld [23] also found that the masker intensity was factored into the decisions. For the masking task, our data indicated an increase in internal noise for some listeners and levels of the soft tones. Why could the combination of the soft and loud tones into an alternating sequence cause an increase in internal noise in the masking task but not in the overall-intensity task? This result is quite unexpected because the temporal sequence of loud and soft tones was virtually identical in the two tasks (apart from the 4 dB level increment placed on the loud tones in the overall-intensity task but not in the masking task). For this reason, any adaptation effects in the auditory periphery caused by the loud tones should have been identical in the two tasks. Put differently, because the two tasks presented virtually the same stimuli but differed in their task requirements (attend to all tones in the overall-intensity task, attend only to the soft tones in the masking task), the increase in internal noise appears to be caused by higher-level processes rather than by peripheral mechanisms. More specifically, our analyses suggest that in the masking task the listeners did not only apply suboptimal weighting strategies, but that there was an additional higher level process causing an increase in internal noise. Additional research is necessary to identify this process. In our observer model (cf. Eq. (3)), the internal noise components associated with each tone are modeled as being 'early', that is, they appear prior to integration. As a consequence, the influence of the internal noise on the decision variable is modulated by the decision weight. If the weight assigned to a given tone is 0, then the internal noise associated with this tone will not contribute to the effective total internal noise. An internal noise source located at or after integration would have a different effect because its contribution is not modulated by the decision weights. Such an additional 'central' internal noise source (e.g., [67]) could be integrated into an extended version of the model. In the present study, we wanted to answer the question of whether the loud tones impair the representation of soft-tone intensity, modeled as an increase in 'early', 'sensory' noise. An additional 'central' noise source would equally affect the information available from the soft tones and the loud tones, and could therefore neither explain the relatively low decision weights assigned to the soft tones in the overall-intensity task, nor the non-zero weights assigned to the loud ''maskers'' in the masking task. More specifically, if the central noise dominated the 'sensory' noise, then the listeners should assign approximately uniform weights, regardless of the internal noise SDs effective for the soft versus the loud tones. This is because in Eq. (12) specifying the optimal ratio between the decision weights for the loud and soft tones, the central noise variance would appear in both the numerator and the denominator. Moreover, the additional noise source would not affect the estimates of relative decision weights, for reasons discussed in Section Estimates of internal noise. However, an increased level of central internal noise in the five-tone sequences compared to the soft-tones-only or loud-tones-only conditions might partly explain the observation of increased total internal noise in some conditions. Data by Lutfi and Gilbertson [68] are evidence against a significant contribution of central noise in a sample discrimination task. However, as they studied sample discrimination of frequency rather than intensity, it remains to be shown if such evidence can be obtained for the task we studied.
Apart from the insights into multitone intensity discrimination, we present a new analysis method for obtaining maximum likelihood estimates of decision weights and internal noise. This method has advantages over the observer efficiency analysis used in previous studies [8,69] because it does not refer to an ideal observer characterized by the complete absence of internal noise.
It would be interesting to apply the same type of molecular analyses to a classical intensity discrimination task instead of the five-tone task studied in the present paper. The method can also be adopted to study other tasks or other sensory modalities. For example, in an experiment on object-based visual attention (cf. [70,71]), the methods described in this paper could be used to quantify the amount of selective attention to the target in the different conditions by measuring the decision weights assigned to target and distractor elements, and a potential increase in internal noise caused by the distractor elements could be identified.