The effect of dopamine transporter blockade on optical self-stimulation: behavioral and computational evidence for parallel processing in brain reward circuitry

The neurobiological study of reward was launched by the discovery of intracranial self-stimulation (ICSS). Subsequent investigation of this phenomenon provided the initial link between reward-seeking behavior and dopaminergic neurotransmission. We re-evaluated this relationship by psychophysical, pharmacological, optogenetic, and computational means. In rats working for direct, optical activation of midbrain dopamine neurons, we varied the strength and opportunity cost of the stimulation and measured time allocation, the proportion of trial time devoted to reward pursuit. We found that the dependence of time allocation on the strength and cost of stimulation was similar formally to that observed when electrical stimulation of the medial forebrain bundle served as the reward. When the stimulation is strong and cheap, the rats devote almost all their time to reward pursuit; time allocation falls off as stimulation strength is decreased and/or its opportunity cost is increased. A 3D plot of time allocation versus stimulation strength and cost produces a surface resembling the corner of a plateau (the “reward mountain”). We show that dopamine-transporter blockade shifts the mountain along both the strength and cost axes in rats working for optical activation of midbrain dopamine neurons. In contrast, the same drug shifted the mountain uniquely along the opportunity-cost axis when rats worked for electrical MFB stimulation in a prior study. Dopamine neurons are an obligatory stage in the dominant model of ICSS, which positions them at a key nexus in the final common path for reward seeking. This model fails to provide a cogent account for the differential effect of dopamine transporter blockade on the reward mountain. Instead, we propose that midbrain dopamine neurons and neurons with non-dopaminergic, MFB axons constitute parallel limbs of brain-reward circuitry that ultimately converge on the final-common path for the evaluation and pursuit of rewards. Author summary To succeed in the struggle for survival and reproductive success, animals must make wise choices about which goals to pursue and how much to pay to attain them. How does the brain make such decisions and adjust behaviour accordingly? An animal model that has long served to address this question entails delivery of rewarding brain stimulation. When the probe is positioned appropriately in the brain, rats will work indefatigably to trigger such stimulation. Dopamine neurons play a crucial role in this phenomenon. The dominant model of the brain circuitry responsible for the reward-seeking behavior treats these cells as a gateway through which the reward-generating brain signals must pass. Here, we challenge this idea on the basis of an experiment in which the dopamine neurons were activated selectively and directly. Mathematical modeling of the results argues for a new view of the structure of brain reward circuitry. On this view, the pathway(s) in which the dopamine neurons are embedded is one of a set of parallel channels that process reward signals in the brain. To achieve a full understanding of how goals are evaluated, selected and pursued, the full set of channels must be identified and investigated.


Introduction
We and our cohabitants are fortunate winners. We have enjoyed reproductive 2 success due to a combination of luck and an array of skills among which acumen in 3 cost/benefit decision making is of paramount importance. Rudimentary ability in this 4 domain can be implemented simply, as is evident from the behavior of animals whose 5 nervous systems comprise only hundreds of neurons [1]. In the multi-million-cell nervous 6 systems of mammals, the foundations of more sophisticated cost/benefit decision 7 making are thought to have been heavily conserved [2,3]. If so, the rodent species so 8 widely studied in neurobiological laboratories are equipped with variants of 9 decision-making circuitry that continues to shape our own choices and actions. 10 A seminal moment in the study of the neural foundations of cost/benefit decision 11 making was the discovery that rats would work vigorously and indefatigably for focal 12 electrical stimulation of sites in the basal forebrain and midbrain [4]. Despite the 13 artificial spatiotemporal distribution of the evoked neural activity, the rats behaved as if 14 procuring a highly valuable, natural goal object, such as energy-rich food. This striking 15 phenomenon, dubbed "intracranial self-stimulation" (ICSS), has been investigated 16 subsequently by means of perturbational, pharmacological, correlational, computational, 17 and behavioral methods that have seen dramatic recent improvements in precision, 18 specificity, and power. For example, the electrical stimulation employed originally 19 activates neurons near the electrode tip with relatively little specificity, whereas 20 contemporary optogenetic methods restrict activation to genetically defined neural 21 populations. Currently employed pharmacological agents are far more selective than the 22 drugs employed in the early studies. Whereas the crude response counts used initially to 23 measure behavioral output are confounded by inherent non-linearity as well as by 24 disruptive side-effects of drugs and motoric activation, contemporary psychophysical 25 methods support inference of the strength and subjective cost of the induced reward as 26 well as the form and parameters of the functions that map observable inputs and 27 outputs into the variables that determine behavioral-allocation decisions. In the current 28 study, we combine, for the first time, direct, specific optical activation of midbrain rewarding electrical stimulation of the MFB [38,45]. The outcome matters in at least 166 two ways. First, it bears on the issue of how the output of midbrain dopamine neurons 167 contributes to reward pursuit. Second, it bears on the series-circuit model that, on 168 casual consideration, appears to integrate the findings obtained by means of direct, 169 specific, optical activation of dopamine neurons with the older findings obtained by 170 means of electrical stimulation that activates dopamine neurons indirectly. 171 Fig 2 illustrates the series-circuit model [10-12, 15, 16], which holds that the The series-circuit model. According to this model, the rewarding effects produced by both electrical stimulation of medial-forebrain-bundle (MFB) neurons and optical stimulation of midbrain dopamine neurons have a common cause: activation of midbrain dopamine neurons. This activation is due to transsynaptic excitation in the case of eICSS and direct excitation in the case of oICSS. The results of prior eICSS studies [38,45,46] require that drugs that perturb dopaminergic neurotransmission alter the computation of reward intensity by actions downstream from the output of a logistic reward-growth function (leftmost box containing an S-shaped curve). The results of the present study require that a similar reward-growth function lies downstream from the dopamine neurons (rightmost box containing an S-shaped curve). The boxed low-pass filter symbols represent the frequency-following functions that map the electrical pulse-frequency into the induced frequency of firing in the directly stimulated neurons subserving eICSS or the optical pulse-frequency into the induced frequency of firing in the dopamine neurons subserving oICSS. K ds (ds stands for "directly stimulated") scales the input to the reward-growth function for eICSS, whereas K da scales the input to the reward-growth function for oICSS. K rg scales the output of the reward-growth functions.
The right half of Fig 2 depicts the generation of the reward-intensity signal 179 subserving oICSS. The four components to the right of the dopamine neurons represent 180 the hypothesis that reward-intensity signals subserving eICSS and oICSS are computed 181 in an analogous fashion. As we will show, this hypothesis is supported by the results of 182 the current study.  estimated objective price at which time allocation is halfway between its minimum and maximum values when frequency-following fidelity is perfect The reward-mountain surfaces fit to the vehicle and drug data from rat Bechr29 are 312 shown in Fig 5, whereas those fit to the data from the remaining rats are shown in the 313 supporting-information file (Figs S17-S22).  Horizontal comparison in Fig 6 shows that blockade of the dopamine transporter shifted 317 the mountain downwards along the pulse-frequency axis, whereas vertical comparison Fig 6. Contour graphs of the reward-mountain surfaces fit to the vehicle and drug data from rat Bechr29. The values of the independent variables along frequency sweeps are designated by red triangles, along price sweeps by blue squares, and along radial sweeps by green circles. The values of the location parameters, P obje and F pulse hm , are indicated by blue vertical lines with diamond end points and red horizontal lines with right-facing triangular end points, respectively. The shaded regions surrounding the lines denote 95% confidence intervals. The vehicle data are shown twice, once in the upper-left quadrant and once in the lower right. The dotted lines connecting the panels designate the shifts in the common-logarithmic values of the location parameters of the mountain, which are designated as {∆P obje , ∆F hm } and plotted in the bar graph in the upper-right panel. The dot-dash cyan lines superimposed on the bars show location-parameter estimates corrected for changes in frequency-following fidelity due to the displacement of the mountain along the pulse-frequency axis. (See section Correction of the location-parameter estimates for changes in frequency-following fidelity in the supporting-information file.) The 95% confidence intervals are shown in the bar graphs as vertical lines.
Location-parameter estimates 328 F hm : The surface-fitting procedure returns the location-parameter value, F pulse hm , 329 that positions the reward mountain along the pulse-frequency axis. In studies of eICSS 330 employing the reward-mountain model, it was assumed that this value corresponded to 331 the induced firing frequency in the directly activated neurons. This firing frequency is 332 the location parameter of the underlying reward-growth function. Given the exceptional 333 frequency-following fidelity of the directly activated neurons subserving eICSS of the 334 MFB [8], it is not unreasonable to assume that each pulse elicits an action potential in 335 most or all of the directly stimulated MFB neurons when the pulse frequency equals 336 F pulse hm . In contrast, the findings reviewed in section Parameters of the 337 frequency-following function for oICSS of the supporting-information file argue 338 that such an assumption is untenable in the case of oICSS of channelrhodopsin-2 339 expressing midbrain dopamine neurons. We therefore used the data from the 340 rate-frequency curves obtained here (Figs 3 and S1-S6) and prior studies (e.g., [54]) to 341 estimate the firing frequencies corresponding to the F pulse hm values. We label these as 342 F * pulse hm (rather than as F f iring hm ) because these values will be plotted along the 343 Table 3. The location of the reward mountain along the price axis. The "log(P * e ) Drg" and "log(P * e ) Veh" columns list the estimated values that the log 10 (P obje ) parameter would have attained in the drug and vehicle conditions, respectively, had frequency following been perfect. The "diff" column shows the differences between the estimates for the drug and vehicle conditions, whereas the "diff Lo" and "diff Hi" columns designate the lower and upper bounds of the 95% confidence interval surrounding these differences. The ⋆ character in the "excl 0" column indicates that zero falls outside the 95% confidence interval surrounding the estimates in the"diff" column. Differences so designated meet our criterion for statistical reliability. The rightmost column lists the proportional drug-induced change in the value of P * e corresponding to the values in the "diff" column. (   Drug-induced shifts in the location parameters of the reward mountain. The ∆log 10 (F * hm ) values give the shifts of the reward-growth function and reward mountain along the pulse-frequency axis, whereas the ∆log 10 P * obje values give the shifts along the price axis. These values have been corrected for the estimated change in frequency-following fidelity due to the drug-induced displacement of the reward mountain along the pulse-frequency axis. According to the reward-mountain model, the ∆log 10 (F * hm ) values reflect action of the dopamine transporter blocker prior to the input to the reward-growth function, whereas the ∆log 10 (P * e ) values reflect drug action at, or beyond, the output of the reward-growth function. ∆log 10 (F * hm ) is shorthand for ∆log 10 F * pulse hm . 372 The corrected values of the location parameters are uncorrelated in both the vehicle 373 (ρ = 0.37, p = 0.41) and drug (ρ = −0.24, p = 0.60) conditions), as are the 374 drug-induced shifts in these values (ρ = −0.2830; p = 0.5385, Fig S29). 375 All parameters of the best-fitting models for each rat are shown in Tabs S10-S15 in 376 the supporting information. (The values of the location parameters in these tables are 377 uncorrected.) 378 Histology 379 Immunohistology and confocal microscopy (Fig 8) confirms that the tips of all show that rescaling the input to the reward-growth function shifts the reward mountain 392 along the pulse-frequency axis, whereas rescaling performed at, or beyond, the output of 393 the reward-growth function shifts the reward mountain along the price axis. 394 Substitution of optical for electrical stimulation [49-51] offers significant advantages: 395 the resulting neural activation is confined to a known, genetically specified, neural 396 population, and the neurons that express the light-sensitive transducer molecule are 397 readily visualized. However, few psychophysical data have been reported concerning the 398 processes that translate the optical input into observable reward-seeking behavior. The 399 results of the present study demonstrate that the dependence of oICSS and eICSS on  The reward-mountain model generalizes successfully to oICSS 414 The shape of the reward-mountain surface fitted to the oICSS data (Figs 5,415 S17-S22) resembles that of the surfaces fit to eICSS data reported 416 previously [8, 37-39, 45, 46, 56, 57]. Time allocation falls as pulse-frequency is decreased 417 and/or price is increased. The resemblance between the shapes of the surfaces fitted to 418 prior eICSS and current oICSS data suggests that the reward-mountain model 419 December 3, 2019 14/44 generalized well to the case of oICSS and that pursuit of rewarding optical or electrical 420 stimulation depends similarly on the strength and cost of reward.

421
Embedded within the reward-mountain model derived in the context of eICSS 422 studies is the notion that reward intensity depends on the aggregate rate of firing 423 induced in the directly stimulated neurons by a pulse train of fixed duration [20][21][22][23]33]. 424 A definitive test of this "counter model" [19] has yet to be reported in the case of oICSS. 425 Nonetheless, there are indications that the counter model holds in this case as well. 426 Ilango and colleagues trained mice to press a lever to receive optical stimulation of 427 ChR2-expressing midbrain dopamine neurons [58]. They showed that lever-pressing 428 rates increased as a function of both pulse duration and optical power, two variables 429 that conjointly determine the number of neurons activated by the optical stimulation. 430 Lever-pressing also increased when an 8-pulse train was delivered at progressively higher 431 pulse frequencies. This latter result cannot be linked unambiguously to the counter 432 model because the train duration covaried with the pulse frequency; in the case of 433 eICSS, the formal relationships between these two variables and reward intensity 434 differ [33]. The frequency-sweep data reported here were obtained with train duration 435 held constant. Thus, they complement and extend the findings of Ilango and colleagues 436 while avoiding the complication of covariation between the pulse frequency and train 437 duration. Time allocation increased systematically as a function of pulse frequency, as 438 the reward-mountain model and the counter model embedded within it predict. 439 Ilango and colleagues also measured changes in lever-pressing rates during 440 performance of oICSS on fixed-interval or fixed-ratio schedules of reinforcement [58]. 441 Response rates declined as either the minimum inter-reward interval or the required 442 number of responses per reward was increased. 443 Fixed-interval schedules set the minimum inter-reward interval, but the subject 444 determines the physical effort expended in harvesting the reward. Although only a 445 single response is required to trigger reward delivery after the fixed interval has elapsed, 446 subjects typically begin responding much earlier and thus expend more effort than is 447 strictly necessary [59]. Fixed-ratio schedules set the number of responses required to 448 trigger reward delivery but cede control of the minimum inter-reward interval to the 449 subject, who can vary response rate so as to bring reward delivery closer (or further) in 450 time, as Ilango and colleagues observed. In contrast to these two classic schedules, the 451 cumulative handling-time schedule employed here fixes both the minimum inter-reward 452 interval and the rate of physical exertion required to secure a reward. This makes it 453 possible to vary the opportunity cost of the reward independently of the required rate of 454 physical exertion, as was done here and in previous experiments carried out in the 455 reward-mountain paradigm. As in the case of the eICSS experiments, increases in the 456 opportunity cost (price) of the reward decreased time allocation systematically (Figs 4,457 S7-S12, 6, S23-S28).

458
Vehicle condition: location parameters and their significance 459 Pulse-frequency axis: In rats working for fixed-duration trains of rewarding, 460 electrical, MFB stimulation, the parameter that positions the reward mountain along 461 the pulse-frequency axis, F pulse hm , depends on the stimulation current [36]. This 462 variable, (in conjunction with the pulse duration [18,60]) determines the number of 463 directly stimulated neurons [19] in a manner dependent upon their excitability and 464 spatial distribution with respect to the electrode tip. By analogy, we expect the number 465 of midbrain dopamine neurons recruited by optical pulses and trains of fixed duration to 466 depend on the optical power, the level of ChR2 expression in the dopamine neurons, 467 and the spatial distribution of these neurons with respect to the tip of the optical probe. 468 December 3, 2019 15/44 Given the inevitable variation in ChR2 expression and probe-tip location, it is not 469 surprising that that F pulse hm varied over a wide range in the vehicle condition, from 470 16.3 -35.8 pulses s -1 (Tab S7). Even after correction for differences in 471 frequency-following fidelity, the estimated firing frequencies induced by the F pulse hm 472 values (i.e., F * pulse hm ) vary more than twofold.

473
Price axis: In principle (and in contrast to F * pulse hm ), the corrected estimate of the 474 parameter that positions the eICSS reward mountain along the price axis, P * obje , can be 475 compared meaningfully across subjects and electrical stimulation sites. The validity of 476 this comparison rests on the assumption that both the value of alternate activities, such 477 as grooming, resting, and exploring, and the effort cost of performing the 478 lever-depression task do not vary substantially across subjects or stimulation sites. If so, 479 then the price at which a maximally intense, stimulation-generated reward equals the 480 value of alternate activities (P * obje ) reflects the maximum intensity of the rewarding 481 effect, independent of the value of the stimulation parameters required to drive reward 482 intensity to its upper asymptote. On this view, the rat is willing to sacrifice more 483 leisure in order to obtain strong stimulation of a"good" eICSS site than a poorer one. Correction of the location-parameter estimates for changes in frequency-following 492 fidelity of the supporting-information file, the P obje estimates from the eICSS studies do 493 not typically require correction for imperfect frequency-following fidelity). The results of 494 this comparison are intriguing.

495
The median of the corrected P * obje values for the vehicle condition of the present 496 study, 11.8 s, is marginally greater than the median of the 54 values reported in 497 previous eICSS studies [8, 37-39, 45, 46, 56, 57], but the range is much more extreme.

498
Whereas P obje values typically vary over a roughly twofold range in the eICSS studies, 499 the P * obje values vary over a more than a tenfold range in the current oICSS dataset. 500 The maximum (56.4) is much greater than any value obtained in the eICSS studies, 501 whereas the minimum (5.4) is smaller.

502
According to the reward-mountain model, the large magnitude of the maximum 503 P * obje value means that direct optical stimulation of midbrain dopamine neurons can 504 produce a much more potent rewarding effect than electrical stimulation of the MFB, at 505 least in an extreme case. Perhaps the maximal intensity of the rewarding effects 506 reported here varies so profoundly across subjects because the activation of different 507 subsets of dopamine neurons is rewarding for different reasons. If so, the large range of 508 P * obje values would add to the accumulating evidence for functional heterogeneity of 509 midbrain dopamine neurons [61][62][63][64][65][66][67][68]. 510 The reward mountain was measured in only seven subjects in the current study, and 511 the dispersion of the tips of the optical probes is modest. Given these limitations and 512 the uncertainly inherent in estimating the location of the optically activated dopamine 513 neurons from the position of the tip, we cannot provide a meaningful assessment of 514 whether the maximum intensity of the rewarding effect, as indexed by theP * obje value, is 515 correlated with the location of the optically activated dopamine neurons. It would be 516 interesting to pursue this issue in a larger number of subjects and across a larger region 517 December 3, 2019 16/44 of the midbrain. Mice can be trained to perform oICSS for stimulation of sites arrayed 518 across much of the medio-lateral extent of dopaminergic cell bodies in the ventral 519 midbrain, from the lateral portion of the substantia nigra to the medial boder of the 520 ventral tegmental area [69]. Does P * obje vary systematically as a function of tip location 521 across the full extents of the midbrain region where dopaminergic cell bodies are 522 located? 523 The reader may be tempted to ascribe the large variation in P * obje values to variables 524 such as across-subject differences in ChR2 expression and the location of the 525 optical-fiber tip. If, as we argue below, the form of the reward-growth function 526 resembles a logistic, such differences would have contributed to the variation in F * pulse hm 527 values and not to the P * obje values. That is so because across-subject differences in 528 ChR2 expression and tip location influence the input to the reward-growth function: 529 the aggregate firing rate induced by the optical stimulation in the midbrain dopamine 530 neurons. In contrast, the scalar that determines the maximum reward intensity (K rg ) 531 alters the output of the reward-growth function (Fig 2), and is one of the components 532 of the P * obje parameter (Eq S30). Thus the reward-mountain model holds that the value 533 of P * obje reflects the maximum intensity of the rewarding effect.

534
According to the argument that the F *  Drug-induced shifts in the position of the reward mountain 539 Tabs 2, S8, S9 show that the dopamine-transporter blocker, GBR-12909, shifted the 540 reward mountain reliably along the pulse-frequency and price axes in all 7 rats. (Only 541 one result is discrepant with those from the rest of the group: the direction of the shift 542 in the data from rat Bechr27, which is also the smallest of the shifts along the 543 pulse-frequency axis.) The magnitudes of the shifts along the two axes are uncorrelated. 544 These results have multiple implications: 545 1. The reward-growth function responsible for oICSS has independent input-scaling 546 and output-scaling parameters and resembles a logistic. 547 2. The shifts along the pulse-frequency axis are due to an effect of the drug at, or 548 prior to, the input to the reward-growth function. 549 3. The shifts along the price axis are due to an independent effect of the drug at, or 550 beyond, the output of the reward-growth function. 551 4. The shifts along the pulse-frequency axis create grave problems for the hypothesis 552 that the rewarding effect produced by electrical stimulation of the MFB arises 553 solely from transsynaptic activation of midbrain dopamine neurons (the 554 series-circuit model).

555
The reward-growth function for oICSS 556 The effects of dopamine-transporter blockade on the reward mountain powerfully 557 constrain the underlying reward-growth function, and the functional form implied by 558 these results has far-reaching implications concerning the structure of brain-reward 559 circuitry. We develop this argument by first analyzing the form of the reward-growth 560 function underlying the reward-mountain model, then by demonstrating how the data 561 support this form, and finally by deriving the implications of this functional form for 562 the structure of brain-reward circuity.

563
The cross-sectional shape of the reward-mountain is determined principally by the The subjective-price function [8] bends the contour lines towards the horizontal at 569 very low prices but has no effect on the location of the reward mountain within the 570 coordinates defined by the independent variables (price and pulse frequency). As 571 estimated by Solomon et al. [39], this function converges on the objective price, and the 572 discrepancy between the subjective and objective values is within 1% once the objective 573 price exceeds 3.18 s. The subjective and objective prices can no longer be distinguished 574 once the objective price approaches the values of the P * obje location parameter (Tab S9). 575 Given the frequency-following function assumed here, frequency-following fidelity is 576 imperfect over much or all of the tested range of pulse frequencies. That said, firing 577 frequency falls only slightly short of the pulse frequency at pulse frequencies below the 578 F pulse hm values ( Fig S13). The procedure for generating the corrected values of the 579 parameter that locates the reward mountain along the pulse-frequency axis (F * pulse hm ) 580 is designed to remove the influence of imperfect frequency-following fidelity. We 581 emphasize that the correction procedure adjusts the estimates of the shifts but cannot 582 manufacture these out of whole cloth. The correction is driven by differences in the 583 location along the pulse-frequency axis of the surfaces fit to the vehicle and drug data. 584 Had the drug failed to displace the mountain surface, the correction would be zero. 585 Although the correction is likely imperfect, the estimated drug-induced displacement of 586 the reward mountain along the pulse-frequency axis (Tab 2) should be due largely to 587 the effect of the drug on the reward-growth function. 588 Unlike the case for eICSS [21][22][23]28], the reward-growth function for oICSS has not 589 yet been measured directly. The fact that a reward-mountain surface based on a logistic 590 reward-growth function fits the current data well provides one hint that this function 591 may indeed be logistic in form, or very similar. But there is a deeper sense in which the 592 results of the present study provide crucial new information about the reward-growth 593 function for oICSS: The displacement of the reward mountain along both the strength 594 and cost axes by dopamine-transporter blockade requires that the reward-growth 595 function for oICSS have independent location-scaling and output-scaling parameters, 596 like the logistic reward-growth function for eICSS. To explain why this is so, it is helpful 597 to reformat the logistic reward-growth equation (Eq S11), as follows:  Fig 9. Influence of input-and output-scaling parameters on reward-growth functions. The upper panels show logistic reward-growth functions, whereas the lower panels show power functions. Solid lines map the pulse frequency into the reward intensity using the assumed frequency-following function (see: Parameters of the frequency-following function for oICSS). F F max is the maximum firing frequency (F f iring max ) attainable given the form and parameters of the assumed frequency-following function. The dashed lines map the firing frequency into the reward intensity. (These lines are drawn assuming perfect frequency-following fidelity (F F max = ∞), and thus pulse frequency and firing frequency are equivalent.) In the case of the logistic reward-growth functions in the upper panels, changing the values of the input-and output-scaling parameters produces independent effects, shifting the reward-growth function in orthogonal directions. In contrast, changing the values of the input-and output-scaling parameters produces identical effects on the power-growth functions plotted in the lower panels.
What would happen if frequency-following fidelity remained the same, but the 607 input-scaling and output-scaling parameters of the reward-growth function were no 608 longer independent? We can simulate such a case by replacing the logistic 609 reward-growth function in Eq 1 with a power function: where K in = the input-scaling parameter K out = the output-scaling parameter This power-growth function can be rewritten as In contrast to the case of logistic growth, the two scaling constants act jointly, in an 612 inseparable manner, and exclusively on the output of the function. Changing either 613 scaling constant shifts the power-growth function along the Y axis but does not change 614 its position along the X axis (lower panels of Fig 9). In contrast to the logistic function 615 (upper panels), the two constants act as one. Fig S33 shows that changing the value of 616 the input-scaling parameter of the power-growth function moves the reward mountain 617 only along the price axis, and Fig S34 shows that changing the value of the 618 output-scaling parameter has the same effect.

619
The three power-growth functions in the lower panels of Fig 9 designated by solid 620 lines all bend at the same location along the X axis. This is because the 621 frequency-following function that translates pulse-frequency into firing frequency 622 (Eq S1, Fig S13)  field [70,71]. It follows that in the current experiment, a lower pulse frequency will 646 suffice to drive peak dopamine concentration to a given level under the influence of the 647 drug than in the vehicle condition. Transporter blockade would thus rescale upwards 648 the input to the reward-growth function. Such an effect is illustrated in Fig S16. The 649 amplification of the stimulation-induced dopamine release by GBR-12909 is represented 650 by the triangle labelled "K da ." The drug rescales upwards the impact of the phasic 651 stimulation-induced increase in the aggregate firing rate ("DA drive") on the input to 652 the S-shaped reward-growth function, which is thus shifted leftwards along the 653 pulse-frequency axis (Fig 9), dragging the reward-mountain surface downward along this 654 axis (Fig 6). New optical methods [72,73]  In addition to increasing the amplitude of stimulation-induced dopamine transients, 659 blockade of the dopamine transporter by GBR-12909 also increases the baseline 660 ("tonic") level of dopamine [45,70,71]. The increase in dopamine tone could rescale the 661 output of the reward-growth function upwards, reduce subjective effort costs and/or 662 diminish the value of activities that compete with pursuit of the optical reward. Such 663 effects could arise from increases in K rg and/or decreases in K ec or K aa (Fig S16).

664
Any combination of these effects could account for the observed rightward shifts of the 665 reward mountain along the price axis (Figs 6,S23-S28). This can be seen by 666 reformatting Eq S30 and expressing it verbally as follows: 667 P sube = maximum reward intensity subjective effort cost × value of competing activities (4) (Reward probability has been omitted from Eq 4 because the probability of reward upon 668 December 3, 2019 20/44 satisfaction of the response requirement was equal to one in the present study.)

669
Teasing apart the three alternative accounts of the drug-induced change in P * obje may 670 not be feasible on the basis of behavioral data alone; it will likely require identification 671 of the neural substrates underlying each of these influences and measurement of how 672 dopamine-transporter blockade influences signal flow within each of these circuits.

673
Implications for the series-circuit model of brain-reward circuitry 674 According to the series-circuit model (Fig 2), the rewarding effects produced either 675 by electrical stimulation of the MFB or by optical stimulation of midbrain dopamine 676 neurons arise from a common cause: activation of the dopamine neurons. The optical 677 stimulation excites these neurons directly, whereas the electrical stimulation activates 678 them transsynaptically. We have explained above that a reward-growth function with 679 independent input-scaling and output-scaling parameters lies downstream from the 680 dopamine neurons responsible for oICSS. If so, boosting the peak stimulation-induced 681 dopamine concentration by means of transporter blockade should shift the contour map 682 of the reward mountain downwards along the pulse-frequency axis in both cases. It does 683 not. Whereas the predicted shift is seen in six of seven cases in the current oICSS results, 684 reliable shifts in this direction are absent in all eight cases reported in the analogous 685 eICSS study [45]. Similarly, reliable, downwards shifts were absent in seven of eight pulse-frequency axis in rats working for electrical stimulation of the MFB but succeeds 697 in doing so in rats working for optical stimulation of the ventral midbrain? We doubt 698 that this is a viable explanation. To explain why we must first lay out in some detail 699 this argument for rescuing the series-circuit hypothesis.

700
Given that ChR2 is found throughout the excitable regions of the cell membrane and 701 that the optical probe is ∼10x larger than the soma of a dopamine neuron, the optically 702 generated spikes likely arise downstream from the somatodendritic region. If 703 dopamine-transporter blockade increased extracellular dopamine concentrations in the 704 somatodendritic region, this would increase binding of dopamine to D2 autoreceptors 705 and potentially decrease the sensitivity of these neurons to synaptic input. Such an 706 effect could well be bypassed in the case of oICSS due to the activation of the 707 ChR2-expressing neurons downstream from the somatodendritic region.

708
In order for such an explanation to account for the failure of GBR-12909 to shift the 709 reward mountain along the pulse-frequency axis in rats working for electrical 710 stimulation of the MFB, the putative autoreceptor-mediated inhibition would have to 711 exactly counteract the drug-induced enhancement of dopaminergic neurotransmission in 712 the terminal fields of those dopamine neurons that received sufficient excitatory 713 synaptic drive to overcome the influence of the D2 stimulation. (Recall that the reward 714 mountain was not shifted reliably along the pulse-frequency axis in either direction in 715 all eight subjects in the eICSS study.) Such exact counterbalancing seems unlikely. The impact of the dopamine transporter is very different in the somatodendritic and 717 terminal regions of dopamine neurons. The rate of dopamine reuptake is as much as 200 718 times higher [74] and dopamine-transporter expression from 3-10 times greater [74,75] 719 in the terminal region than in the somatodendritic region. In a study carried out in 720 guinea-pig brain slices, GBR-12909 failed to increase electrically induced release of 721 dopamine in the VTA cell-body region [76], in contrast to its potent augmentation of 722 release in terminal regions. This drug greatly boosted the amplitude of dopamine 723 transients recorded voltammetrically in the nucleus accumbens terminal field of rats 724 working for rewarding electrical stimulation of the VTA [70]. A recent study [77] shows 725 very similar release of dopamine in the nucleus accumbens of rats working for electrical 726 stimulation of either the VTA or of the MFB site used in the eICSS study of the effect 727 of GBR on the reward mountain [45], suggesting that GBR-12909 would boost release of 728 dopamine in the nucleus accumbens in response to electrical stimulation of the MFB 729 site employed in the study of the effect of dopamine-transporter blockade on the 730 position of the reward mountain [45]. Moreover, cocaine, which also blocks the 731 dopamine transporter, greatly increased release of dopamine in the nucleus-accumbens 732 shell in response to transsynaptic activation of midbrain dopamine neurons by electrical 733 stimulation of the laterodorsal tegmental area [78,79]. Taken together, the evidence 734 makes it highly unlikely that in the reward-mountain study by Hernandez et al. [45], 735 autoreceptor-mediated inhibition prevented GBR-12909 from boosting phasic release of 736 dopamine in the terminal fields of midbrain dopamine neurons. Thus, the challenge 737 posed by the present data to the series-circuit model stands, and alternative accounts 738 must be explored.

739
Toward a new model of brain-reward circuitry 740 To account for both the eICSS and oICSS data in the simulations, we propose 741 development and exploration of models in which the reward-intensity signals evoked by 742 electrical stimulation of the MFB and by optical stimulation of midbrain dopamine 743 converge on a final common path. In such models, distinct reward-growth functions 744 translate aggregate firing rate in the MFB neurons and the dopamine neurons into 745 reward intensity. One such model is shown in Fig 10. The two reward-intensity signals 746 (the output of the two reward-growth functions) converge onto the final common path 747 underlying the behaviors entailed in approaching and holding down the lever. Thus, we 748 are proposing that reward-related signals flow in parallel through a portion of the brain 749 reward circuitry subserving ICSS and that the reward-intensity signal in the MFB limb 750 of the circuit bypasses the midbrain dopamine neurons en route to the final common 751 path. Parallel pathways conveying reward-intensity signals to a final common path. The directly activated neurons subserving eICSS of the MFB are depicted at the left of the top row of symbols. The electrode also activates a second population of fibers, with lower frequency-following fidelity [80], that project directly or indirectly to midbrain dopamine neurons, which have yet lower frequency-following fidelity. Optical stimulation of midbrain dopamine neurons activates only the lower limb of the circuit. The outputs of the reward-growth functions in the two limbs converge (Σ) on the final common path for reward evaluation and pursuit.

752
Convergence models face a seemingly daunting challenge: electrical stimulation of 753 the MFB activates midbrain dopamine neurons [71,[80][81][82][83]. If so, one would expect such 754 stimulation to drive signaling in both of the hypothesized converging pathways.

755
Wouldn't this produce at least some displacement of the reward mountain along the  Asymptotic reward intensity is quite low. One reason for this is that the lower-limb 778 F pulse hm value is well within the roll-off range of the assumed frequency-following 779 function. Another is that frequency-following fidelity in the MFB neurons that generate 780 transsynaptic excitation of the midbrain dopamine neurons is poorer than in neurons In the upper row, the strength of the MFB drive is equivalent to optical stimulation more than sufficient to generate half-maximal reward intensity, whereas in the lower row, the MFB drive is equivalent to the strongest optical stimulation employed in the present study. The effect of increased tonic-dopamine signaling is presumed to arise from decreases in subjective effort costs or the value of alternate activities but not from upwards rescaling of reward-intensity. Thus the vertical asymptotes of the cyan curves are not altered by the drug.
The left shift along the pulse-frequency axis in Fig 11 is insufficient to overtake the 795 cyan curve for the upper (MFB) limb of the model. Thus, the summated curve for the 796 drug condition (dashed magenta lines in the upper middle and upper right panels) is 797 not shifted laterally with respect to the summated curve for the vehicle condition (solid 798 magenta lines). As a consequence, the simulated reward mountain does not shift along 799 the pulse-frequency axis ( Fig S35). Thus, the simulated output obtained using moderate 800 MFB drive on the dopamine neurons replicates the eICSS findings.

801
What would happen if the MFB drive were much stronger? We repeated the 802 simulations setting the MFB drive to the equivalent of an optical pulse frequency of 80 803 pulses s -1 , around the highest value tested in most of the subjects in the current study. 804 The results are shown in the lower row of Fig 11. The reward-growth curve for the lower 805 (dopaminergic) limb of the model (green curve) is now shifted leftwards in the vehicle 806 condition (lower-left panel) and rises to a higher asymptote. From its enhanced starting 807 position along the abscissa, the simulated reward-growth curve for the lower limb in the 808 drug condition is now able to overtake the reward-growth curve for the upper limb, The convergence models can account for a heretofore unexplained discrepancy 856 between dopamine release and self-stimulation performance. In rats performing eICSS 857 of the MFB, Cossette and colleagues traded off the stimulation current against the pulse 858 frequency [80]. As expected on the basis of the counter model, increases in pulse 859 frequency required that the current be reduced in order to hold behavioral performance 860 constant. In accord with a more detailed study of the frequency response of the MFB 861 neurons subserving eICSS [8], the required current continued to decline as the pulse 862 frequency was increased beyond 250 pulses s -1 . In contrast, stimulation-evoked 863 dopamine release in the nucleus accumbens, monitored by means of fast-scan cyclic 864 voltammetry, increased very little, or not at all, as the pulse frequency was increased 865 from 120 to 250 pulses s -1 and generally declined when the pulse frequency was 866 increased further to 1000 pulses s -1 . In the series-circuit model, midbrain dopamine 867 neurons relay signals from the directly activated MFB neurons to the behavioral final 868 common path. Thus, the trade-off between the induced firing frequency and the number 869 of directly activated MFB neurons recruited by the current should be manifested 870 faithfully in the stimulation-induced release of dopamine. It was not. The discrepancy 871 between the trade-off functions for eICSS and dopamine release is not explained by the 872 series-circuit model but is readily accommodated by the convergence model.

873
Huston and Borbély documented rewarding effects produced by electrical stimulation 874 of the lateral hypothalamic level of the MFB in rats that had undergone near-total 875 ablation of telencephalic structures [84,85]. Although the major telencephalic terminal 876 fields of the midbrain dopamine neurons had been damaged heavily, the rats learned to 877 perform simple movements to trigger delivery of the electrical stimulation. The 878 series-circuit hypothesis leaves these data unexplained, but the convergence model could 879 accommodate them. For example, hypothalamic or thalamic neurons that survived the 880 ablations might relay reward-related signals to brainstem substrates of the behavioral 881 final common path via MFB fibers, as Huston proposed.

882
Johnson and Stellar [86] made large, bilateral, excitotoxic lesions in the 883 nucleus-accumbens (NAC) and ventral pallidum (VP) of rats that had been trained to 884 perform eICSS of the lateral hypothalamic level of the MFB. The series-circuit model 885 predicts that such extensive damage to a key dopaminergic terminal fields should have 886 greatly reduced the rewarding effectiveness of the electrical stimulation. It did not. The 887 authors concluded that "while the NAC and VP have been shown to be important for 888 various kinds of reward," ... they do "not appear to be critical for the expression of 889 ICSS reward. It may be the case that the ICSS reward signal is processed downstream 890 from the NAC and VP and is therefore unaffected by total destruction of either 891 structure." That conclusion contradicts the series-circuit hypothesis but it is perfectly 892 compatible with, and indeed anticipates, the convergence model presented here. The essential role of modeling and simulation 894 It is common in behavioral neuroscience to make predictions and assess results on 895 the basis of binary, directional classification. The effect of a manipulation either does or 896 does not meet a statistical criterion. If it does, the value of the output variable is said 897 to go up or go down. The numeric values of the input and output variables typically 898 matter little.

899
The reward-mountain model is more ambitious. It makes predictions about the form 900 of the relationships between the variables. As is so often the case in biology, the 901 relationships are non-linear. There are ranges of an input variable, such as the pulse 902 frequency, over which an output, such as time allocation, changes little and other ranges 903 over which the output changes a lot. Thus, the snide answer to the question of whether 904 an output will change is "it depends," and the serious answer entails specifying this 905 dependency in terms of functional forms and their parameters. The numeric values 906 matter.

907
Albeit in a modest way, the reward-mountain model confronts the problem of 908 convergent causation, the fact that although behavioral output is highly constrained by 909 the relatively small number of muscles, joints and degrees of freedom in an animal body, 910 a very large number of neural controllers have access to the behavioral output. A given 911 level of reward pursuit may be directed at a large, but expensive, reward or at a small, 912 inexpensive one. The reward-mountain model distinguishes such cases by tying reward 913 pursuit to both the strength and costs of reward, and it goes beyond binary and 914 directional classification of effects by specifying the functions that map these physical 915 quantities into the subjective values that determine goal selection and behavioral 916 allocation.

917
The reward-mountain model is quantitative: given inputs consisting of reward 918 strengths and costs, it outputs time-allocation values. This output changes in specified, 919 lawful ways when internal parameters, such as the scaling of reward intensities, are 920 perturbed by manipulations, such as drug administration. A virtue of such an approach 921 is that it requires explicit statement of assumptions. A shortcoming, perhaps, is that 922 the specification of the numerous, unavoidable assumptions tends to elicit skepticism 923 that may be eluded by verbal formulations that allow assumptions to remain unstated 924 and implicit. In our view, it is best to make our ignorance manifest so as to incite 925 ourselves and our colleagues to reduce it. 926 Another sense in which we believe the quantitative specification of the model to be 927 important concerns the often-hidden perils of relying on verbal reasoning alone to Diagrams such as Fig 10 include a large number of components and may invite the 948 viewer to imagine William of Ockham turning in his grave. That said, parsimony entails 949 the jettisoning of superfluous entities. We invite the reader to identify which of the 950 components in the models can be tossed overboard without loss of explanatory power. 951 Can the data from previous eICSS experiments and the present oICSS experiment be 952 explained more simply? If we had found an affirmative answer to this question we 953 would have implemented it. Indeed, we see the models discussed here as over-simplified 954 rather than over-complicated. For example, they say nothing about fundamental 955 matters such as the functional specialization of dopamine subpopulations, how signals 956 from the "milieu interne" modulate the decision variables, or how the subjects learn and 957 update the reward intensities and costs that determine their behavioral allocation.

958
Nonetheless, we argue that the combination of the modeling, simulation and empirical 959 work provides a new perspective on the structure of brain-reward circuitry while likely reason is that the series-circuit model relegates these neurons to a subsidiary role 974 that has thus inspired little empirical investigation: on that view, the MFB fibers 975 merely provide an input, likely one of many [92,93], to the dopamine neurons ultimately 976 responsible for the rewarding effect.

977
The convergence model elevates the status of the directly activated neurons 978 subserving the rewarding effect of MFB stimulation. This model asserts that multiple, 979 partially parallel, neural circuits can generate reward and that the dopamine neurons do 980 not constitute an obligatory stage in the final common path for their evaluation and 981 pursuit. From that perspective, it is important to intensify the search for the limb(s) of 982 brain reward circuitry that may parallel the much better characterized dopaminergic 983 pathways. Application of modern tracing methods that integrate approaches from 984 neuroanatomy, physiology, optics, cell biology and molecular biology (e.g., [94]) may 985 well achieve what application of the cruder, older tools failed to accomplish. The 986 detailed psychophysical characterization of the quarry that has already been achieved, 987 particularly the evidence for myelination and axonal trajectory [10][11][12][13][14]89], can guide 988 the application of such methods. Remarkable success has been achieved in developing tools for specific excitation or 992 silencing of dopaminergic neurons, measuring the activity of these neurons, mapping the 993 circuitry in which they are embedded, and categorizing different functional 994 dopaminergic subpopulations. The neurobiological study of dopamine neurons has been 995 coupled to learning theory, neural computation, and psychiatry, with longstanding 996 application in the study of addictive disorders [6,16,95,96] and emerging linkage to the 997 analysis of depression [97,98]. Perhaps the brilliance of these successes has obscured the 998 possible roles played by other neurons and circuits in the functions in which The present study demonstrates that the reward-mountain model also provides a  We propose that alternatives to the series-circuit model be explored. In the one 1046 sketched here, the reward signal carried by the MFB axons runs parallel to the reward 1047 signal carried by the midbrain dopamine neurons prior to the ultimate convergence of 1048 these two limbs of brain-reward circuitry onto the final common path for the evaluation 1049 and pursuit of rewards. This proposal can accommodate findings unexplained by the 1050 series-circuit model and suggests a research program that could complement the work 1051 that has so powerfully and convincingly implicated dopaminergic neurons in reward.

1053
Subjects 1054 Seven TH::Cre, male, Long-Evans rats weighing 350 g at the time of surgery, served 1055 as subjects. Animals were obtained from a TH::Cre rat colony established from three 1056 sires generously donated by Drs. Ilana Witten and Karl Deisseroth. Upon reaching 1057 sexual maturity, animals were housed in pairs in a 12 h -12 h reverse light-cycle room 1058 (lights off at 8:00 AM). The rats were housed singly following surgery. at a volume of 0.5 µl, at three different DV coordinates (-8.2, -7.7 and -7.2

1075
Atropine sulfate (0.02-0.05 mg/kg, 1 mL/kg, Sandoz Canada Inc., Quebec) was injected 1076 s.c. to reduce bronchial secretions, and a 0.3 mL dose of penicillin procaine G (300 1077 000 IU/ml, Bimeda-MTC Animal Health Inc., Cambridge, Ontario) was administered 1078 SC, as a preventive antibiotic. "Tear gel" (1% w/v, 'HypoTears' Novartis) was applied 1079 to the eyes to prevent damage from dryness of the cornea. Anesthesia was maintained 1080 throughout surgery by means of isoflurane (1 − 2.5% + O 2 ). The head of the rat was 1081 fixed to the stereotaxic frame (David Kopf instruments, Tujunga, CA) by means of ear 1082 December 3, 2019 29/44 bars inserted into the auditory canal and by hooking the incisors over the tooth bar.

1083
Bregma and Lambda were exposed by an incision of the scalp. Three blur holes were 1084 drilled in the skull over each hemisphere   reduce spurious output noise) and adjusted through trial and error for each rat to elicit 1130 robust oICSS behavior (30-60 mW, measured at the tip of the patch-cord with the laser 1131 operating in continuous-wave mode). Animals were trained by means of the successive 1132 approximation procedure to depress the lever to receive optical stimulation (a 1 sec 1133 train of 5 ms optical pulses at 80 pulses per second (pps). After the rats had learned to 1134 lever press, they were allowed to work for the optical reward, on a Continous 1135 Reinforcement (CRF) schedule, during two 15 min trials. The total number of presses 1136 was recorded. Both of the implants were tested under these conditions; the implant that 1137 yield the larger number of presses was used for the rest of the experiment.

1138
Training in preparation for measurement of the reward 1139 mountain 1140 The rats received further training to prepare them for measurement of the reward 1141 mountain (Fig. S37). First, the rats were trained to perform a new reward-procuring 1142 response, holding down the lever rather than simply pressing it briefly. This new 1143 response was rewarded according to a cumulative-handling-time schedule of 1144 reinforcement [101], which delivers a reward when the cumulative time the lever has 1145 been depressed reaches an experimenter-defined criterion called the "price" of the 1146 reward. In this sense, the price corresponds to what economists call an opportunity cost. 1147 To earn a reward, the rats did not have to hold down the lever continuously until the 1148 price criterion was met; they could meet the criterion by means several bouts of lever 1149 holding separated by pauses. The onset and offset times of each bout of lever depression 1150 were recorded.

1151
Next, the values of one or both independent variable (pulse frequency, price) were 1152 varied sequentially from trial to trial, thus traversing the independent-variable space 1153 along a linear trajectory. Such a traversal is called a "sweep." Three types of sweeps 1154 were carried out. Frequency sweeps were carried out at a fixed price (initially 1-2 s).

1155
These sweeps consisted of 10 to 12 trials during which the rat had the opportunity to 1156 harvest as many as 60 rewards (except for rat BeChr19, who was allowed to harvest a 1157 maximum of 30 rewards per trial due to the unusual effectiveness of optical stimulation 1158 in this subject). Each reward was followed by a 2 s Black Out Delay (BOD) during 1159 which the lever was disarmed and retracted, and timing of the trial-duration was paused. 1160 The pulse frequency during the first two trials was set to yield maximal reward-seeking 1161 behavior. The first trial was considered a warm-up trial and was excluded from analysis. 1162 From the second trial onwards, the rewarding stimulation was decreased systematically 1163 from trial to trial by decreasing the pulse frequency in equal proportional steps. The 1164 range of tested pulse frequencies was selected as to drive reward-seeking behavior from 1165 its maximal to its minimal value in a sigmoidal fashion (S37B). Every trial was 1166 preceded by a 10 s Inter-Trial Interval (ITI) signaled by a flashing light. During the 1167 last 2 s of this period rats received priming stimulation consisting of a non-contingent, 1168 1 s stimulation train, delivered at the maximally rewarding pulse frequency. Rats 1169 performed one frequency sweep per session, consisting of ten trials (except for subject 1170 BeChr19 who was tested on 12 trials per frequency sweep).

1171
When the rats showed consistent performance across trials and sessions in the 1172 frequency-sweep condition, we introduced price sweeps into the training sessions. In the 1173 price-sweep condition, the pulse frequency was kept constant at the maximal value for 1174 each rat, but the cumulative work time required to harvest the reward (i.e. the price of 1175 the reward) was increased systematically across trials. The price was the same on the 1176 first two trials of each sweep. As in the frequency-sweep condition, the first trial was 1177 considered a warm-up trial and was excluded from analysis. Starting at the second trial 1178 December 3, 2019 31/44 of the sweep, prices were increased by equal proportional steps across trials. The prices 1179 were set by trial and error so as to yield a sigmoidal transition between maximal and 1180 minimal reward-seeking behavior as a function of price (Fig. S37B). Trial duration was 1181 set so as to allow the rats to harvest a maximum of 60 rewards per trial (30 in the case 1182 of BeChr19). The BOD, ITI, and priming were the same as in the frequency sweep.

1183
During price-sweep training sessions, the rats also performed a frequency sweep. The 1184 order of the sweeps was randomized across sessions.

1185
Radial sweeps were incorporated when performance on the price sweeps appeared 1186 stable. Along radial sweeps, the pulse-frequency was decreased and the price was 1187 increased concurrently across trials. Thus, frequency sweeps run parallel to the 1188 pulse-frequency axis in the independent-variable space, price sweeps run parallel to the 1189 price axis, and radial sweeps run diagonally. The radial sweeps were composed of ten 1190 trials; the first trial served as a warm-up and was identical to the second trial. From the 1191 second trial onwards, both pulse frequency and price were varied in equal proportional 1192 steps so as to yield a sigmoidal decrease in reward-seeking behavior over the course of 1193 the sweep (Fig. S37B). The trajectory of the vector defined by the tested pulse Rats received i.p. injections 90 min prior to behavioral testing. Vehicle (2.0 ml/kg) 1207 was administered on Mondays and Thursdays and GBR-12909 (20 mg/kg) on Tuesdays 1208 and Fridays. In each session, rats performed a frequency, a price, and a radial sweep in 1209 random order. Each sweep consisted of ten trials each (except for the frequency sweep 1210 for rat BeChr19, which consisted of 12 trials). The duration of each trial was set so as to 1211 allow rats to harvest a maximum of 60 rewards per trial (except for rat BeChr19, who 1212 was allowed to harvest a maximum of 30 rewards per trial due to his unusual proclivity 1213 to work for very high opportunity costs). Wednesdays and weekends were used as drug 1214 elimination days: no testing was conducted on these days, and the rats remained in the 1215 animal care facility. Ten vehicle and ten drug sessions were conducted with each rat.

1216
Calculation of time allocation 1217 The raw data were the durations of "holds" (intervals during which the lever was 1218 depressed by the rat) and "release times" (intervals during which the lever was extended 1219 but not depressed by the rat). Total work time included 1) the cumulative duration of 1220 hold times during a trial, and 2) release times less than 1 s. The latter correction was 1221 used because during very brief release intervals, the rat typically stands with its paw 1222 over or resting on the lever [101]. Therefore, we treat these brief pauses as work and  We restricted the flexibility of the models by fitting common values of T min and T max 1260 to the data from the two treatment conditions {vehicle, drug}. The rationale is that the 1261 factors causing T min to deviate from zero and T max to deviate from one tend to be be 1262 common across vehicle-and drug-treatment conditions. The main experimental 1263 question posed concerns the effect of the drug on the location parameters. Thus, these 1264 two parameters {F pulse hm , P obje } were always free to vary across treatment conditions. 1265 Variants of both the standard and CR models were produced in which all, some, or none 1266 of the a, CR, and g parameters were common across the two treatment conditions.

1267
Thus,twelve models were fit: four variants of the standard model responses emitted in each of a series of nine or ten trials lasting two minutes each. The 1293 pulse frequency was decreased systematically across trials: the highest pulse frequency 1294 was in effect during the first trial of the series (the "warm-up") and also on the second 1295 trial, and data from the first trial was excluded from analysis. Subsequently, the 1296 pulse-frequency decreased in equal logarithmic steps from trial to trial. A single 1297 pulse-frequency sweep was run in each of five to six sessions per day. In each session, 1298 the optical power (measured at the tip of the patch-cord with the laser operating in 1299 continuous-wave mode) was set to one of five values (1.87, 3.75, 7.5, 15 or

1348
• Tab S1. Definition of acronyms and symbols 1349 • Tab S2. The functions composing the reward-mountain model.

1358
-Parameters of the frequency-following function for oICSS.

1359
-Displacement of the shell: distinguishing two sources.

1360
-Correction of the location-parameter estimates for changes in 1361 frequency-following fidelity.

1362
• Model fitting and selection 1363 -Tab S3 The 12 candidate models fit to each dataset.

1364
-Tab S4 Model-evaluation statistics for the fit of the 12 candidate 1365 models to the data from rat Bechr29.

1366
-Tab S5 Summary statistics for the model that provided the best 1367 fit (highest evidence ratio) to the data for each rat.

1368
-Tab S6 Best-fitting models for all rats.

1382
• Tabs S10-S16 Parameter values from the best-fitting models for rats  The effect of dopamine transporter blockade on optical self-stimulation: behavioral and computational evidence for parallel processing in brain reward circuitry   The data for rat Bechr29 are shown in Fig 4 in the main text.

5
Derivation of the reward-mountain model 6 Acronyms and symbols employed in this supporting-information file are defined in Tab S1.  objective probability that a reward will be delivered upon satisfaction of the response requirement p sub subjective probability that a reward will be delivered upon satisfaction of the response requirement P e notation for P obje used in previous eICSS papers P * e shorhand notation for P * obj e P obj objective opportunity cost ("price") of a stimulation train P obj e objective price at which T = 0.5 when R = R max P * obj e estimated objective price at which T = 0.5 and R = R max if frequencyfollowing fidelity were perfect P sub subjective opportunity cost ("price") of a stimulation train P sub T =0.5 vector of subjective prices that hold time allocation midway between its minimum and maximum values P sub bend parameter controlling the abruptness of the transition from the "blade" to the "handle" of the subjective-price function P sub min minimal subjective pricė R aa average rate of reward from performance of alternate (leisure) activitieṡ R aaṘaa normalized to vary between 0 and 1 R bsr peak reward intensity achieved over the course of a stimulation train R bsr R bsr normalized to vary between 0 and 1 as R bsr rises from 0 to R bsr max R bsr max maximum normalized reward intensity R bsr T =0.5 vector of normalized reward intensities that hold time allocation midway between its minimum and maximum valueṡ R bsr rate of brain-stimulation reward; reward intensity divided by the subjective price paid to procure the pulse train  The reward-mountain model provides a framework for integrating the frequency-sweep, pulse-sweep, and 8 radial-sweep data (e.g., Figure 4) in a unified 3D space and for interpreting the drug-induced changes in the position 9 of the resulting 3D structure (the reward mountain). The following derivation of the model first extends earlier 10 depictions that were developed in the context of eICSS studies [1][2][3] and then adapts the model to accommodate 11 oICSS of midbrain dopamine neurons.

12
The reward-mountain model predicts time allocation given experimenter-controlled variables that determine the 13 strength and cost of the rewarding stimulation. In most experiments carried out to date, the strength variable is the 14 pulse frequency within a fixed-duration stimulation train, and the cost variable is the work time (opportunity cost, 15 price) required to procure a stimulation train. The functional machinery that generates time-allocation values from 16 these inputs is summarized in Tab S2. map the values of variables that are manipulated or controlled into inputs to the core of the model. The core functions (middle column) map these inputs into the payoffs from work and leisure activities, whereas the core → shell function map the payoffs into the observed dependent variable: time allocation. The accompanying Matlab ® Live Script illustrates how the listed functions are implemented. Please see Tab S1 for definitions of the symbols.
shell → core core core → shell As the table implies, we distinguish between the "shell" and "core" of the model. The shell consists of the 19 variables that are observed (time allocation), manipulated (pulse frequency, price), and controlled (stimulation 20 parameters held constant, physical work required to hold down the lever, affordances of the test environment [4]).

21
The shell is displayed within the space defined by the observed and manipulated variables.

22
The core (middle column of Tab S2) consists of the functions that compute the intensity of the reward produced 23 by the stimulation train and combine this value with the opportunity and effort costs to generate what we call 24 "payoffs." Parallel functions in the core compute the value of the alternate activities that compete with pursuit of the 25 stimulation for the rat's behavior. A set of functions (left column) provides the input to the core by mapping the 26 manipulated and controlled variables into the quantities from which payoffs are derived.

27
A single function (right column), based on the generalized matching law [5], translates the payoffs generated in 28 the core into the time-allocation values that are manifested in the shell.

29
Shell variables are objective, whereas core variables are inferred subjective quantities. The core functions are the 30 bridge between the objective inputs that are manipulated or controlled to the observed objective output, time 31 allocation; their form and parameters explain why the manipulated variables cause time allocation to vary in the 32 manner observed in the experiment.  psychophysical means and shown to be roughly scalar up to very high pulse frequencies [6], well beyond the typical 43 range of the F pulse hm values that locate the reward-mountain shell in the space defined by the two independent 44 variables. Solomon et al. [6] showed that the following function provides a good fit to the frequency-following data for 45 eICSS of the MFB: where f F = the frequency-following function F bend = parameter determining the abruptness of the roll-off in the frequency response; units: unitless F f iring = induced firing rate in the first-stage neurons; units: f irings s −1 neuron −1 F pulse = the pulse frequency; units: pulses s −1 F ro = the pulse frequency in the center of the roll-off region; units: pulses s −1 K F = unit-translation constant; units: f irings pulse −1 neuron −1 A plot of the frequency-following function is shown in Fig S13. The form of the function is the same as the one 47 described by Solomon et al. [6]. The choice of parameters is described below in section Parameters of the 48 frequency-following function for oICSS. neurons. Holding these variables constant, as is the case in most eICSS studies entailing measurement of the reward 52 mountain and in prior studies employing the curve-shift method [7][8][9], circumvents the need to make assumptions 53 about the spatial distribution of the directly-stimulated neurons subserving the rewarding effect and about their 54 excitability to extracellular stimulation.

55
Hawkins (cited in [10]) proposed that the number of directly-stimulated ("first-stage") neurons subserving the 56 rewarding effect is roughly proportional to the current, when pulse duration is held constant. Emprical 57 studies [11,12] show that the current required to excite a given first-stage neuron varies roughly as a rectangular,  Thus, we assume that subscript) [13]. This function returns the subjective probability that a reward will be delivered upon payment of the 66 price set by the experimenter. Over the range, 0.5 -1.0, this function was determined to be roughly scalar. In the 67 present study and in all other studies that have entailed measurement of the reward mountain, delivery of the reward 68 upon satisfaction of the response requirement is certain (p obj = 1).
where f p = the subjective-probability function p obj = objective probability that reward will be delivered upon satisfaction of the response requirement p sub = subjective probability that reward will be delivered upon satisfaction of the response requirement f P : Past measurements of the reward-mountain in eICSS studies employed the work time required to trigger delivery 70 of a stimulation train (the price) as the cost variable, and that practice is continued here. The function labeled f P

71
(upper-case subscript) maps the required work time, P obj , into the corresponding subjective variable, P sub . The form 72 and parameters of this function for eICSS of the MFB have been estimated by psychophysical means [14]. We showed 73 that the following psychophysical function more accurately describes the opportunity cost of rewarding brain 74 stimulation than the identity function or functions based either on hyperbolic or exponential discounting: where f P = the subjective-price function P obj = the "price" of a stimulation train: the cumulative time the lever must be depressed to trigger reward delivery; units: s P sub = the subjective price of a stimulation train; units: s P sub min = the minimum subjective price; units: s P sub bend = a constant that controls the abruptness of the transition from "blade" to "handle;" unitless . For a different view of the subjective-price function, please see [15].

76
At higher prices, the output of the subjective-price function defined by Eq S6 converges on its input: the 77 subjective price becomes indistinguishable from the objective one. However, as the objective price is reduced below 78 ∼3 s, the subjective price deviates from the objective price and eventually approaches an asymptotic value: P sub min . 79 This asymptote has been interpreted to arise from the reduction and eventual disappearance of competition between 80 lever depression and competing activities, such as grooming, resting, and exploring; performance of these competing 81 activities is no longer perceived as beneficial once the available time for their execution becomes sufficiently short.

82
A plot of the subjective-price function is shown in Fig S14. The parameters employed are the mean values 83 determined by Solomon et al. [14].
Fig S14. Subjective opportunity-cost ("price") function. The function maps the objective opportunity cost (cumulative lever-depression time required to trigger reward delivery), P obj ),into its subjective equivalent P sub . The form and parameters of this function are based on measurements by Solomon et al. [6].

84
The arguments of the first five shell → core functions are all variables controlled directly by the experimenter:

85
{F pulse , d, I, D, p, P }. These six variables are transformed by the first five shell → core functions into inputs to the 86 core. The arguments of the remaining two shell → core functions listed in Tab S2 are variables arising from features 87 of the test environment that the experimenter attempts to hold constant: the rate of physical work entailed in 88 holding down the lever or in performing activities that compete with pursuit of the rewarding stimulation. 89 f φ : The effort cost of the reward is the subjective rate of exertion entailed in holding down the lever. We know of no 90 psychophysical studies that reveal the form of this function. That said, we can assume that it accelerates steeply as 91 the effort cost approaches the physical capabilities of the rat. The objective effort cost has been held constant in this 92 and in prior studies carried out in the reward-mountain paradigm. We assume that the subjective effort cost did not 93 co-vary with the manipulated variables. To help ensure this in the current study, the lever was withdrawn for 2 s 94 following the triggering of a stimulation train so as to provide time for any interfering motoric consequences of the 95 stimulation to dissipate. 96 We treat the subjective rate of exertion required to depress the lever as a constant defined by: φ obj W = rate of physical work required to hold down the lever; units: Js −1 φ sub W = subjective rate of exertion required to hold down the lever in units we call "oomphs" s −1 K φ = unit-conversion constant; units: oomphs J −1 The dots overφ obj W andφ sub W signify that we define these quantities as rates. (No dots are placed over F f iring and 98 F pulse because doing so would be superfluous and potentially misleading. Frequencies are inherently rates over time 99 (the first time derivative of the pulse or spike number). By omitting the dots, we wish to avoid confusion between 100 these rates and their changes over time (the second derivative of the pulse or spike number).) 101 The effect of the drug, if any, on the rate of subjective exertion is defined as: where K ec W = proportional drug-induced change in the subjective rate of exertion required to hold down the lever; unitless In the vehicle condition, K ec W assumes an implicit value of one.

103
Activities such as grooming and exploring also entail performance of physical work. Thus, f φ is also applied to 104 these activities: wherė φ obj L = Average rate of physical work required to perform alternate ("leisure") activities; units: Js −1 . φ sub L = average subjective rate of exertion entailed in performance of leisure activities in oomphs s −1 . K φ = unit-conversion constant; units: oomphs J −1 We assume thatφ sub L does not covary systematically with the independent variables.

106
As in the case of the subjective rate of exertion entailed in work, we allow for drug-induced modulation of the 107 subjective rate of exertion entailed in performance of leisure activities.
where K ec L = proportional drug-induced change in the subjective rate of exertion required to hold down the lever; unitless In the vehicle condition, K ec L assumes an implicit value of one.

109
f a : We do not know which aspects of the leisure activities that compete with pursuit of BSR give rise to reward 110 signals in the brain nor how these signals are encoded. That is why both the arguments and output of the seventh 111 December 3, 2019 9/37 shell → core function (f a ) have been left blank. However, we do know that valuation of these activities influences the 112 allocation of time to pursuit of experimenter-controlled rewards: enrichment of the test environment shifts allocation 113 towards alternate activities and away from the experimenter-controlled reward [16]. Thus a function such as f a must 114 exist. We include f a in the list of shell → core functions for completeness and in recognition of this requirement.

115
Core functions 116 Core functions determine the reward rates produced by work (lever depression) and leisure (alternate) activities. 117 These are combined with the associated effort costs to yield a pair of payoffs, which are then passed to the 118 core → shell function for translation into time allocation. 119 f R : The core receives a set of spike trains as a result of the combined action of the first three shell → core functions 120 {f F , f N , f D }, one spike train from every activated first-stage neuron. According to the counter model [10,11,17], the 121 effects of the spike trains delivered by the individual first-stage neurons are summed, and thus, the intensity of the 122 rewarding effect is determined by the product of the number of activated first-stage neurons and the rate at which 123 they are fired by the stimulation train. This is why a Π symbol is used in the flow diagrams to represent the drive 124 produced by a pulse train of fixed duration on the scalar at the input of the reward-growth function (Fig 2). 125 The logistic form of the reward-growth function was described originally in operant-matching studies carried out 126 by Gallistel's group [18][19][20]. Shizgal [21] proposed the following expression for this function: (S11) where f R = the reward-growth function F f iring hm = firing rate required to drive reward intensity to half its maximum value; units: F pulse hm = pulse frequency required to drive reward intensity to half its maximum value; units: pulses s −1 g = the exponent that determines the steepness of rewardintensity growth as a function of pulse frequency K rg = reward-growth scalar; units: hedons R bsr = reward intensity produced by F f iring ; units: hedons R bsr = normalized reward intensity produced by a pulse frequency of F pulse , which, in turn, produces a firing frequency of F f iring in each first-stage neuron; unitless. 0 ≤ R bsr ≤ 1 According to Eq S11, When frequency-following fidelity is sufficiently high, the normalized reward intensity R bsr will approach one at 129 high pulse frequencies. The lower the value of the location parameter F f iring hm and the higher the value of the 130 reward-growth exponent (g), the easier this will be to achieve.

131
To accommodate the predicted rescaling of the input to the reward-growth function for oICSS by 132 dopamine-transporter blockade, a scalar, K da , is added to Eq S11 in the drug condition of the experiment:

133
December 3, 2019 10/37 where C = chronaxie: train duration at which F f iring hm is twice the value of ρ Π ; units: s ρ Π = aggregate rate of firing required to produce a reward of half-maximal intensity when the train duration is infinite; probability-weighted rate of brain-stimulation reward and the subjective rate of exertion required to hold down the 148 lever (see [23]): or as a benefit/cost ratio: In the case of BSR, there is solid evidence that aggregate impulse flow in the first-stage neurons encodes the 151 signal that will be translated into the intensity of the reward [17,20]. Although the identity of the first-stage neurons 152 subserving eICSS of the MFB (or any other brain site) remains unknown, directly-driven MFB neurons with 153 properties that match the psychophysically derived portrait of the first-stage fibers have been observed by means of 154 electrophysiological recording [24][25][26].

155
The argument of the reward-growth function for leisure activities is left blank, thus signifying our ignorance of 156 how pursuit of these activities is encoded by the brain and translated into a reward rate. That said, the considerable 157 evidence that the value of leisure activities competes effectively with experimenter-controlled rewards [5,16,[27][28][29] 158 implies that the payoffs from work and leisure are commensurable. Accordingly, we define a reward rate for the 159 leisure activities that compete with pursuit of BSR: unitless. 0 ≤˙ R aa ≤ 1 ( ) = the unknown variables that give rise toṘ aa The payoff from these alternate activities is computed in a manner analogous to the computation of the payoff from 161 BSR, as a ratio of reward and subjective-exertion rates: where U L = payoff from leisure activities; units: hedons oomph −1 The core → shell function 163 f T : The payoffs from pursuit of BSR (U W ) and engagement in leisure activities (U L ) are used by the sole 164 core → shell function to compute the allocation of time to pursuit of BSR. This behavioral-allocation function is 165 derived from the single-operant matching law [5,30,31]: where a = price-sensitivity exponent; unitless T = time allocation; unitless T max = maximum time allocation; unitless T min = minimum time allocation; unitless U W = payoff from pursuit of rewarding brain stimulation ("work"); units: hedons oomph −1 U L = payoff from pursuit of alternate ("leisure") activities; units: hedons oomph −1 Even when the rat is working maximally, latency to depress the lever is typically greater than zero, and T max is thus 167 typically less than one. The rat tends to sample the lever at trial onset, even when the payoff from brain stimulation 168 is low. Thus, T min is typically greater than zero.

169
The single-operant matching law was formulated initially to account for the behavior of subjects working on 170 variable-interval schedules of reinforcement. The cumulative handling-time schedule in force in the present study [32] 171 is more akin to a fixed-ratio schedule in that the number of rewards earned is strictly proportional to time worked. 172 On such schedules, time allocation shifts more abruptly than on variable-interval schedules as the value of the 173 experimenter-controlled reward is varied. In the original version of the single-operant matching law [30,31], the payoff 174 terms are not exponentiated. In contrast, the price-sensitivity exponent (a) in Eq S21 allows the reward-mountain 175 model to account for time-allocation shifts of varying abruptness.

176
To simplify the remaining derivation of the reward-mountain model, we define a normalized measure of time 177 allocation: Substituting from Eq S22 in Eq S21, we obtain Time allocation as a function of reward strength and cost 180 We are now in a position to tie the dependent variable, time allocation (T ), to the two independent variables, 181 pulse frequency (F pulse ) and price (P obj ). Substitution for U W and U L from Eqs S17 and S20, respectively, in Eq S22 182 yields: An initial step towards simplifying Eq S25 is to multiply each of the terms on the right by To simplify Eq S25 further, we first define T mid as the time-allocation value midway between maximal and 186 minimal time allocation: According to Eqs S22 and S23, We now hold time allocation at T mid and drive reward intensity to its maximum value R bsr = R bsr max .

188
Substituting for U W from Eq S17 and reversing Eq S28, we obtain where P obj e = objective price at which T = T mid when R bsr = R bsr max P sub e = subective price at which T = T mid when R bsr = R bsr max Rearranging the terms yields For consistency with previous papers, we use the subscript, "e" in the symbol P sub e to refer to the fact that the 191 payoff from brain stimulation equals the payoff from alternate activities ("everything else") when the normalized 192 reward intensity is maximal ( R bsr = R bsr max ) and the subjective price (P sub ) equals P sub e . This equivalence 193 between the two competing payoffs is what drives time allocation to the half-way point (T mid ) between its minimal 194 (T min ) and maximal (T max ) values.

195
Rearranging Eq S30, we obtain: Substituting in Eq S26 from Eqs S11, S24, and S31, we obtain By setting R bsr = R bsr max and P sub = P sub e in Eq S32, it can be seen readily that T = 0.5, thus satisfying the 198 definition of P sub e as the price at which time allocation to pursuit of a maximally intense reward is halfway between 199 T min and T max .

200
P sub and R bsr must trade off to hold T at a given level. The lowest attainable subjective price is the value 201 corresponding to an objective price of zero, which we will call P sub 0 . (Negative values of P obj may be required to 202 drive P sub to P sub min .) Consider a vector of subjective prices, P sub T , that extends from P sub 0 to the highest tested 203 value of P sub and a corresponding vector of normalized reward intensities, R bsr T where P sub 0 = the subjective price corresponding to an objective price of zero The higher the subjective price, the higher the normalized reward intensity required to hold time allocation constant. 206 When T = 0.5, Eq S33 reduces to: Below (see: Contour lines: the trade-off between pulse frequency and price to hold time allocation constant), we use 208 Eqs S33 and S34 to obtain the equation for the contour lines that provide a two-dimensional description of the 209 reward-mountain surface.

210
To complete the derivation of the reward-mountain model, we now substitute for T in Eq S23 and expand Eq S32 211 so that time allocation is expressed in terms of the independent variables: price (P obj ) and pulse frequency (F pulse ), 212 which appear as the arguments of the subjective-price and frequency-following functions, respectively: The conditioned-reward variant of the reward-mountain model 214 The six-parameter version of the reward-mountain model incorporates Eqs S1, S6, and S35. The fitted parameters 215 are a (the price-sensitivity exponent), F pulse hm (the pulse frequency at which reward intensity is half maximal), g 216 (the reward-growth exponent) P obje (the price at which time allocation to pursuit of a maximal reward falls midway 217 between its minimal and maximal values), T min (minimum time allocation), and T max (maximal time allocation).

218
The seven-parameter version includes an additional parameter, C r , to reflect conditioned reward. This seventh 219 parameter reflects a learned value, above and beyond the payoff from the stimulation train, associated with the lever 220 and/or the act of holding it down. The paper that introduced this parameter [ where C r = the conditioned reward, expressed in terms of the equivalent pulse frequency Note that the way the C r parameter was incorporated into Eq S36 causes this parameter to interact with both 223 the a (Eq S35) and g (Eq S36) parameters. That form of the model failed to yield consistently converging fits when 224 applied to the current dataset. To address this problem, we altered the way that the C r parameter is incorporated 225 into the reward-growth function (Eq S11) so as to reduce its interaction with the other parameters: The resulting reward-mountain surface is produced by substituting the expression for R bsr from Eq S37 in Eq S32, 227 as follows: The reward-mountain surface shows the observed behavioral output (time allocation, T ) as a function of the two 233 independent variables, the price (P obj ) and pulse frequency (F pulse ). These three variables constitute the shell of the 234 reward-mountain model, together with the controlled variables: the work rate required to hold down the lever 235 (φ obj W ), the leisure activities afforded by the test environment, and the average work rate required to perform these 236 Minimal changes were made to adapt the model prior to fitting the reward-mountain surface to the data: 244 1. New values were used for the parameters of the frequency-following function (Eq S1) so as to accommodate the 245 known properties of midbrain dopamine neurons and the optical-power versus pulse-frequency trade-off data 246 reported here.  paradigm [3,33].

260
Parameters of the frequency-following function for oICSS 261 The form of the frequency-following function (f F ) for channelrhodopsin-2-mediated excitation of midbrain 262 dopamine neurons has yet to be determined. Here, we used a function of the same form as the one we had determined 263 previously for eICSS of the MFB [6], but we substituted new parameter values. Dopamine neurons cannot fire nearly 264 as fast as the directly stimulated neurons subserving the rewarding effect of electrical MFB stimulation [6,34,35], and 265 the kinetics of channelrhodopsin-2 are slow in comparison to those of the voltage-gated channels responsible for 266 electrically induced neural firing [36]. Thus, frequency-following parameters determined for eICSS of the MFB cannot 267 be used to account for the frequency response of the dopamine neurons subserving oICSS of the ventral midbrain.

268
Although two studies found that frequency-following fidelity in optically stimulated midbrain dopamine neurons 269 fell to only 40-50% at optical pulse frequencies of 40-50 pulses s -1 [34,35], results of two other electrophysiological 270 studies show very good firing fidelity at 50 pulses s -1 [37,38]. Moreover, results of a recent electrochemical study [39] 271 show that optically induced dopamine release in the nucleus accumbens continued to rise as the pulse frequency was 272 increased from 40-50 pulses s -1 . Poor frequency-following fidelity was found by Lohani and colleagues in midbrain 273 dopamine neurons optically stimulated at 100 pulses s -1 [40], suggesting that the upper limit on the induced firing 274 rate lies at a significantly lower pulse frequency.

275
Figs 3 and S5 show two cases (data from rats BeChR29 and 27) in which the behavioral effectiveness of the 276 stimulation continues to rise at pulse frequencies up to, or beyond, 60 pulses s -1 . The curves for the remaining rats 277 approach asymptote earlier, but this does not necessarily reflect failure the higher pulse frequencies to increase firing: 278 saturation of reward-intensity growth [20] could be responsible instead.

279
In view of the power-frequency trade-off data reported here and the results reported in the studies cited above, we 280 set the middle of the roll-off region of the frequency-following function (F ro ) to 50 pulses s -1 and the parameter 281 governing the abruptness of the roll-off (F bend ) to 20. The resulting frequency-following function is shown in Fig S13. 282 It continues to climb up to optical pulse frequencies of ∼100 pulses s -1 to attain a maximum induced firing rate of 283 51.6 spikes s -1 . The purpose of the experiment is to draw inferences about brain-reward circuitry from the effect of 286 dopamine-transporter blockade on the location of the shell of the reward mountain within the space defined by the 287 two independent variables, the objective opportunity cost of the reward, P obj , and the pulse frequency, F pulse .

288
However, the core variables about which the inferences are to be drawn operate in spaces defined by the subjective 289 opportunity cost, P sub , and the induced frequency of firing, F f iring (Tab S2). As we explain below, the 290 core → shell function, f F , which maps the pulse frequency into the evoked frequency of firing, contributes to the 291 estimates of both location parameters of the shell. Its influence must be removed in order to isolate the effects of the 292 drug on F f iring hm and P sub e , the location parameters of the core. In the depiction and interpretation of the fitted 293 surfaces, we will distinguish between 294 • primary displacement of the shell of the reward mountain due to drug actions on the core components and

295
• secondary (additional) displacement of the shell due to the differential response of the frequency-following In the following section, we explain how to remove the secondary displacements from the location-parameter 298 estimates, thus isolating the primary displacements that are the focus of the study.

299
Correction of the location-parameter estimates for changes in frequency-following fidelity 300 Fitting the mountain surface to time-allocation data returns the two location parameters of the reward-mountain 301 shell: P obj e , which positions the shell along the price axis, and F pulse hm , which positions the shell along the 302 pulse-frequency axis. Whereas the value of F pulse hm is independent of the value of P obj e , the reverse does not hold 303 when frequency-following fidelity is imperfect. Changes in frequency-following fidelity alter the maximum reward 304 intensity ( R bsr max ) , which contributes to the value of both P obj e and its subjective equivalent, P sub e . The portion 305 of the shifts in P obj e due to changes in frequency-following fidelity must be removed in order decouple estimates of 306 that parameter from shifts along the pulse-frequency axis. Once P sub e has been suitably corrected, manipulations 307 that act at, or beyond, the output of the reward-growth function (f R , Fig S16) shift the mountain core uniquely 308 along the price axis, whereas manipulations that alter the input to the reward-growth function shift the mountain 309 core uniquely along the frequency axis [2,3,41].

310
Imperfect frequency-following fidelity can also alter the extent to which the surface of the mountain shell shifts 311 along the pulse-frequency axis. This problem will arise if frequency-following fidelity differs in the drug and vehicle 312 conditions. Correction is required in order to estimate the shift of fundamental interest, which is the displacement of 313 the reward-growth function (a core component) along its firing-frequency axis.

314
The frequency-following function. The correction of the location-parameter estimates arises from the form of 315 the frequency-following function (f F ). The form and parameters of this function for eICSS of the MFB were 316 described by Solomon et al. [14]. In the absence of analogous data for oICSS, we have assumed the same functional 317 form but have tuned the parameters to accommodate the power-frequency trade-off data reported here and results of 318 prior studies [37][38][39][40].

319
In double-logarithmic coordinates, the frequency-following function (Fig S13) has the form of an inverted hockey 320 stick, with a straight handle that transitions into a flat blade [6]. Pulse frequency is represented along the abscissa of 321 this function, and firing frequency is represented along the ordinate. The induced firing frequency follows the pulse 322 frequency perfectly over the portion of the handle that is truly straight, grows more slowly over a transition zone, and 323 levels off over the blade portion. Thus, within the transition zone, a given drug-induced decrement in firing frequency 324 (a core variable) will correspond to a larger decrement in pulse frequency (a shell variable). Drug-induced 325 displacement of the reward-growth function (f R ) towards lower values of F f iring hm improves frequency-following 326 fidelity (by moving the pulse frequency toward, or onto the straight "handle"). This causes the displacement of the 327 shell to exceed, and thereby overestimate, the underlying displacement of the reward-growth function.

328
To correct our estimate of how far the reward-growth function has shifted along the pulse-frequency axis, we need 329 to decouple its value from the maximum normalized reward intensity, R bsr max . This is achieved by using the 330 assumed frequency-following function (Eq S1) to estimate F f iring hm from F pulse hm . and f F (F pulse hm ) are one and the same. We will plot the value in question in the 333 coordinate space of the shell. There, the pulse frequency, rather than the firing frequency, serves as the ordinate.

334
Thus, in that context, we use F * pulse hm in lieu of f F (F pulse hm ) as our notation.

335
f F (F pulse hm ) (and thus F * pulse hm ) is a value along the ordinate of the frequency-following function (Fig S13), 336 whereas F pulse hm is a value along the abscissa. Given the form of the frequency-following function, 337 F * pulse hm < F pulse hm once pulse frequency exceeds the capacity of the neurons to fire reliably to each and every pulse. 338 Several steps are required to correct the the parameter that locates the reward mountain along the price axis so 339 that it too is decoupled from frequency-following fidelity. The step first is analogous to the estimation of F * pulse hm 340 from F pulse hm : We use the subjective-price equation (Eq S6) to transform P obj e into its subjective equivalent, P sub e . 341 The next step is to correct P sub e for the effect of imperfect frequency-following fidelity. Eq S30 can be rearranged 342 as follows: Eq S40 reminds us that P sube is proportional to the maximum normalized reward intensity that can be attained, 344 R bsr max . Eq S12 defines R bsr max in terms of the maximal attainable firing frequency, F f iring max , the firing 345 frequency corresponding to F pulse hm , and the reward-growth exponent, g. To estimate F f iring max , we solve Eq S1 346 for a pulse frequency more than high enough to drive firing frequency to its maximum (F pulse max = 1000 pulses s -1 ). 347 We then use the resulting estimate of R bsr max to produce a revised estimate of P sub e : where P * sub e = estimated value that P sub e would have attained had frequency-following fidelity been perfect Last, we transform P * sub e into its objective-price counterpart, P * obj e by passing P * sub e through the back-solution of 349 the subjective-price equation [6]: for P * sub e ≥ P sub 0  Table S3. The 12 candidate models fit to each dataset. Values of the " free" parameters were free to differ between the vehicle and drug conditions, whereas a single value was fitted to the data from both conditions in the case of " com" (common) parameters. Additional columns list the total number of "Common" and "Free" parameters along with their "Totals." Models 2,5,8,and 11 are based on the six-parameter version of the reward-mountain model (Eq S35), whereas the remaining models are based on the seven-parameter version (Eq S38).
Num a free g free CR free CR com Common Free Total   Table S5. Summary statistics for the model that provided the best fit (highest evidence ratio) to the data for each rat. The models in the "Model Num" column are defined in Tab S3. The residual sum of squares is listed in column "RSS," the total sum of squares in column "TSS," and the adjusted R 2 in column "Adj R 2 ."  Bechr14,19,26,29), the best-fitting model was one in which common values of the a and g parameters were fit to the 372 data from both the vehicle and drug conditions; in the remaining three cases, the best-fitting model was one in which 373 the values of a, g, or both were free to vary across the vehicle and drug conditions.

374
In four cases (Bechr19, 26,27,28), the C r parameter was included in the best-fitting model, whereas in the three 375 remaining cases, it was not. As Fig S15 illustrates, the C r parameter will be advantageous when time allocation in 376 low-payoff trials is higher along pulse-frequency sweeps than along price or radial sweeps. In no case was the value of 377 this parameter free to vary across the vehicle and drug conditions in the best-fitting model. Thus, there is no Table S6. Best-fitting models for all rats. The models are described in Tab S3. Values of the " free" parameters were free to differ between the vehicle and drug conditions, whereas a single value was fitted to the data from both conditions in the case of " com" (common) parameters. Additional columns list the total number of common and free parameters and their totals

Rat
Model num a free g free CR free CR com Common Free Total         The corresponding graphs for rat Bechr29 are shown in Fig 6 in  conditions. In six of seven cases, a lower pulse frequency sufficed to produce a reward of half-maximal intensity under 386 the influence of dopamine-transporter blockade than in the vehicle condition.

387
The F pulse hm estimates vary over more than a doubling range in both the drug and vehicle conditions.

388
Particularly in the vehicle condition, the higher values fall within a range over which the assumed frequency-following 389 function (??) rolls off, thus preventing the normalized reward-growth function from approaching a value of one at the 390 highest pulse frequencies tested. This is why the uncorrected ( F pulse hm ) and corrected (F * pulse hm ) values in Tab S7) 391 differ. For example, the ∼31 pulses s -1 that produced a reward of half-maximal intensity in rat Bechr29 in the vehicle 392 condition are estimated to have generated only ∼26 firings s -1 .

393
Tab 2 in the main text shows the estimated drug-induced shifts in the location of the reward-mountain core 394 along the frequency axis.

395
Eqs S13,S14 express the effect of the drug on the location parameter of the reward-growth function as a divisor: 396 the more the drug boosts dopamine release, the lower the value of the location parameter for the drug condition and 397 thus the farther the reward-growth function is shifted to the left. The rightmost column in Tab 2 lists the values of 398 this divisor implied by the drug-induced shifts in the position of the reward mountain along the pulse-frequency axis. 399 By analogy to Eq S14,  " columns, will be lower than the uncorrected values, which are shown in the "F pulse hm " columns. The corrected values in the "F * pulse hm " columns are the estimated firing frequencies induced by the pulse frequencies in the "F pulse hm " columns, derived from the frequency-following function described in section ??. The values listed in the "Veh" columns are from the fits to the data acquired in the vehicle condition, whereas the values in the "Drg" columns are from the fits to the data acquired under the influence of GBR-12909.  [3,33,43,44], changes in the location of the fitted 401 surface along the price axis have been attributed to variables acting at, or beyond, the output of the reward-growth 402 function, whereas changes in the location of the fitted surface along the pulse-frequency axis have been attributed to 403 variables acting at, or prior to, the input to the reward-growth function. The reward-mountain model treats these 404 two sets of changes as independent, a postulate that is largely supported by empirial findings [1,2,13]. This 405 interpretation is valid as long as the induced firing frequency can be driven high enough to maximize reward intensity. 406 Tab S8 shows that this assumption does not hold in several of the datasets from the present study: The maximum 407 normalized reward intensity, R bsr max , is substantially less than one in these cases.

408
The deviation of R bsr max from one is generally greater in the vehicle data than in the drug data. In such cases, a 409 December 3, 2019 25/37 Table S8. Estimates of the maximum normalized reward intensities in the vehicle and drug conditions. The values in " R bsr max Drg" and " R bsr max Veh" columns are the maximum normalized reward intensities in the vehicle and drug conditions, respectively, based on the frequency-following function described in section ??. The ratios of these values (Drg / Veh) are listed in the "Ratio" column, and the common logarithms of the ratios in the "log(Ratio)" column. the reward-mountain surface into a range of pulse frequencies over which the fidelity of frequency following is better 411 than in the vehicle condition. That contribution to the change in the value of the P obj e parameter reflects mitigation 412 at the input to the reward-growth function (Eq S11), thus undermining the independence of the changes in the 413 parameters that locate the reward mountain along the price and frequency axes. That is why we computed corrected 414 estimates (P * obj e ), as described in section Correction of the location-parameter estimates for changes in 415 frequency-following fidelity thus compensating for the differences in the value of R bsr max across the vehicle and 416 drug conditions. The estimates of P obj e and P * obj e are shown in Tab S9.
417 Table S9. The position of the reward mountain along the price axis. The P obj e parameter determines the position of the reward mountain along the price axis. When the maximum normalized reward intensity differs between the drug and vehicle conditions due to differences in frequency-following fidelity, the value of the P obj e parameter is affected. The values listed in the "P * obj e " columns have been corrected to remove this effect. They show the estimated value that the P obj e parameter would have attained had frequency-following fidelity been perfect.   The reward-growth function for oICSS 423 Fig S30. Changing the value of the input-scaling parameter, F hm , shifts the mountain along the pulse-frequency axis. F hm is shorthand for F pulse hm Fig S31. Changing the value of the output-scaling parameter, K rg , shifts the mountain along the price axis. P e is shorthand for P obj e

Rat
Illustration of the correction for imperfect frequency-following fidelity 424 When the parameter that locates the reward mountain along the pulse-frequency axis (F pulse hm ) falls within the 425 range over which the induced firing frequency diverges substantially from the pulse frequency, then the magnitude of 426 a drug-induced shift along the pulse-frequency axis is exaggerated, and a fictive shift is produced along the price axis. 427 Fig S32 illustrates this effect and its removal by means of the correction procedure. The dot-dash cyan contour line 428 shows the objective-price and pulse-frequency values that would have driven time allocation halfway between its 429 minimal and maximal values had frequency-following fidelity been perfect. In contrast, the solid, wide, black contour 430 line shows the equivalent objective-price and pulse-frequency values given the assumed frequency-following function. 431 Note that the dot-dash cyan line deviates more from the solid, wide, black line in the simulated vehicle data in the 432 upper left and lower right quadrants than in the simulated drug data in the lower-left quadrant. This is so because 433 the uncorrected location-parameter value (vehicle: ∼ 39 pulses s -1 ; drug: ∼ 18 pulses s -1 ) is much closer to the 434 estimated maximum attainable firing frequency (51.67 firings s -1 ) in the vehicle condition than in the drug condition. 435 As a result, the simulated firing rate falls further below the pulse frequency in the vehicle condition than in the drug 436 condition. The dot-dash lines superimposed on the bar graphs show the result of correcting the location-parameter 437 shifts for this effect.  Here, we derive the equation for the contour lines, thus updating an earlier derivation [14] in which it had been 447 assumed that the higher pulse frequencies tested drive reward intensity to its maximum attainable value ( R max → 1). 448 This will indeed be so if the pulse frequencies in question are substantially lower than the frequency-following limit in 449 the directly stimulated neurons. That assumption was usually justified in previous eICSS studies in which the reward 450 mountain was measured [2,3,6,13,14,33,43,44]. Highly excitably MFB neurons served as the directly activated 451 substrate for the rewarding effect in those studies. In contrast, midbrain dopamine neurons are directly activated 452 substrate in the current study. Not only do these neurons have more limited frequency-following abilities than their 453 MFB counterparts [38,40], their activation is due to optical excitation of a relatively slow opsin [36] rather than to 454 electrical excitation of voltage-sensitive membrane channels. As we show below, it is likely that the highest pulse 455 frequencies employed in the present study did not always succeed in driving reward intensity to its maximum, 456 particularly in the vehicle condition. To accommodate such cases, we now generalize the previously published 457 expression for the contour lines [14]. 458 We begin by reformatting Eq 33 from the main text as follows: where a = price-sensitivity exponent P obj cont = vector of prices for which T = T cont when R is a corresponding element of R cont (S46) P obj e = objective price at which T = T mid when R = R max f P (P obj cont ) = subjective equivalent of P obj cont f P (P obj e ) = subective price at which T = T mid when R = R max R cont = vector of normalized reward intensities for which T = T cont when P obj is a corresponding element of P obj cont ; 0 ≤ R ≤ 1 R max = maximum normalized reward intensity T cont = time allocation represented by the contour line T cont = normalized time allocation represented by the contour line; Substituting for R from Eq 12 in the main text, we obtain: We now multiply both sides by f F ( F pulse cont ) g + f F (F pulse hm ) g , yielding and we then expand the right side to yield: December 3, 2019 32/37 The terms that include f F ( F pulse cont ) g are collected on the left side and the left side is factored to yield: Re-arranging the terms, we obtain: The numerator and denominator of the right side are now multiplied by f P (P obj e ) × 1− T cont T cont The contour graphs will be plotted in double logarithmic coordinates. In that space, Eq S53 becomes When time allocation falls halfway between T min and T max (T = T mid ; T = 0.5), Eq S54 reduces to: Expanding the right side, we obtain When R max = 1, Eq S56 reduces to 465 log 10 f F ( F pulse T =0.5 ) − log 10 f F (F pulse hm ) = 1 g × log 10 f P ( P obj T =0.5 ) − log 10 f P (P obj e ) − f P ( P obj T =0.5 ) This folder contains three files: • The Live Script: GBR_eICSS_oICSS_v10.mlx • A non-executable HTML version of the Live Script that can be viewed in a browser.
• A .zip archive of the 17 graphics files that the Live Script will import: Imported_figures.zip The setup of the Live Script is described in the section entitled "Preliminaries" and is implemented in lines 4-24.
The Live Script should run on any version of Matlab from R2018a onwards. The script hasn't been tested on earlier versions, but it may run on versions as far back as R2016a.
Execution of the entire script can take several minutes on a reasonably fast system. It may prove most practical to work through it in sections via the "Run Section" and "Run and Advance" buttons of the Live-Script editor.
Most variables are cleared at the end of each major section. The ones that are required throughout are all defined before line 316 ("Moving the mountain: validation studies").
Once the script has run to that point, the user can navigate directly to sections of interest via the "Go To" button, e.g.
• line 599: "The significance of orthogonal shifts" • line 870: "At what stage of processing does perturbation of dopaminergic neurotransmission alter reward seeking in eICSS?" • line 878: "Optical intracranial self-stimulation of midbrain dopamine neurons" • line 1030: "Dopaminergic modulation of subjective effort cost" • line 1122: "The series-circuit model of eICSS and oICSS" • line 1406: "The convergence model" MATLAB Live Script to supplement "The effect of dopamine transporter blockade on optical self-stimulation: behavioral and computational evidence for parallel processing in brain reward circuitry" This document was not intended to stand alone; It is assumed that anyone working through this script has already read the manuscript, with particular attention to the derivation of the reward-mountain model in the supporting-information file.

Preamble
In the current experiment, rats worked for optical stimulation of midbrain dopamine neurons. Time allocated to reward seeking was measured as a function of the strength and cost of the stimulation. We call the resulting three-dimensional data structure the "reward mountain." Dopamine neurotransmission was perturbed by administration of a dopamine-transporter blocker, and the consequent displacement of the reward mountain was determined.
The reward-mountain model was developed to account for data from experiments on rats working for rewarding electrical brain stimulation (electrical intracranial self-stimulation: eICSS). This model incorporates information obtained over many decades of research on the neural circuitry that intervenes between the tip of the stimulating electrode and the behavioral effects of the stimulation. Although this work has been dogged by uncertainty about the identity of the directly stimulated neurons, painstaking psychophysical experiments have provided extensive information about the directly activated neurons responsible for the rewarding effect, the operating principles of the neural circuitry in which they are embedded, and the reward-seeking behavior generated by this circuitry. Of particular importance to the current study are experiments characterizing the spatiotemporal integration of the electrically evoked reward signals and the growth of the rewarding effect as a function of stimulation strength.
In contrast to the case of eICSS, the identity of the directly stimulated substrate responsible for optical intracranial self-stimulation (oICSS) of midbrain dopamine neurons is known -the stimulation specifically and directly activates the dopaminergic neurons. However, only rudimentary information is available to date about the growth or spatiotemporal integration of the optically induced reward signal. One purpose of this document is to explore what the current experiment reveals about spatiotemporal integration and reward growth in the neural circuitry underlying oICSS. Another purpose is to explore the implications of the current work for understanding how eICSS and oICSS are related at the level of neural circuitry. We show by means of simulation that the results of the current experiment pose grave difficulties for an intuitively appealing account we and others proposed previously: that eICSS arises from the indirect activation of the directly stimulated dopamine neurons that give rise to oICSS. These difficulties challenge widely held notions about the organization of brain circuitry underlying reward.

Preliminaries
Setting the graph2files variable to true will cause graphs to be written to external files. These files are stored in a subdirectory of the current folder called 'Figures.' Setting this variable to false will speed execution.
The show_graphics variable determines whether graphics are stored within this Live Script. This variable must be set to false to enable efficient editing of this Live Script. When this Live Script has been executed with show_graphics set to true, subsequent editing and task switching will be slowed unacceptably. Set show_graphics to true only in preparation for saving updated html and/or pdf copies of this file. To prepare for editing after this script has been stored with show_graphics set to true, reset this variable to false, re-run the script, and save it.
Setting the tabs2files variable to true will store information about equations, figures, functions, and symbols in external Excel files. Setting this variable to false may speed execution slightly.
Setting the saveWS variable to true will cause the Matlab workspace to be stored in a .mat file in the folder from which this live script was run.
Be sure to set the default_dir to the directory you wish to use as the default on your system.
As its name implies, the 'Imported_Figures' folder contains images that this Live Script will import from external files. Those images should be stored in a sub-folder of the folder from which this Live Script will be run. The name of that folder is the argument of the set_impfigdir command below.
Blocks of redundant code are included in various sections. Although this lengthens the Live Script, it reduces the memory load due to the accumulation of workspace variables. Given that most variables are cleared at the boundaries between major sections, the blocks of redundant code enable debugging by means of "Run Section," "Run and Advance," etc. In a future revision, efficiency may be increased by aggregating the variables assigned in these blocks in structures that are saved and re-loaded.  Figures', graphs2files); % Sub-directory of default directory % Define the directory containing stored images to be imported. ImpFigDir = set_impfigdir('Imported_figures'); % Sub-directory of default directory % Define the directory that will house the tables & create if if necessary. tabdir = set_tabdir('Tables', tabs2files); % Sub-directory of default directory The following technical section implements preliminary steps required to set up the simulations.
In order to set up the simulations, some code must be executed to define basic functions and initialize variables. This document was designed to run on any version of Matlab from R2018a onwards without requiring installation of external function files or scripts. All functions are either built-in (supplied by the Mathworks as part of the standard Matlab installation) or local (i.e., defined at the end of this Live-Script document). This should allow this script to execute on any standard installation of the supported Matlab versions.
The following code initializes the tables that store information about the formatted equations, figures and symbols used in this script.

Load the numbers and descriptions of the pre-defined functions (see below, "Tools for this live script" and "Building blocks for the functions included in the simulations."
Enter symbols defined in the introductory paragraphs above.

The mountain model tic;
The reward-mountain model was developed to account for operant performance as a function of the cost and strength of an experimenter-controlled reward, within the context of eICSS studies. The task performed to obtain the rewarding stimulation entails putting time on a clock by holding down a lever; a reward is delivered when the cumulative time the lever has been depressed reaches an experimenter-defined criterion (the "price" of the reward). The measure of operant performance employed is time allocation: the proportion of trial time that the rat devotes to procurement of the rewarding stimulation ("work"). The model is derived in the main body of the accompanying manuscript and then adapted for application to oICSS. Papers cited here are listed in the reference section of the main body of the manuscript.
In this section, we define the functional building blocks that will be used to simulate the output of the rewardmountain model.
We distinguish between the "shell" and "core" of the model. The shell consists of the variables that are observed (time allocation), manipulated (pulse frequency, price), and controlled (stimulation parameters held constant, physical work required to hold down the lever, affordances of the test environment for leisure activites, such as grooming, resting, and exploring). The core consists of the functions that compute the intensity of the reward produced by the stimulation train and combine this value with the opportunity and effort costs entailed in its procurement, thus generating what we call "payoffs." A single function based on the generalized matching law translates the payoffs generated in the core into the time-allocation values that are manifest in the shell.
Please see the main body of the text for definitions, explanations, and details.
The following graph relates the induced firing frequency to the electrical pulse frequency: This graph shows that the firing of the directly stimulated neurons subserving eICSS of the medial forebrain bundle can follow the pulse frequency faithfully up to very high values (>= 350 pulses per second); the response of the neurons flattens abruptly as pulse frequency is increased further. This high-fidelity frequency following is consistent with the view that the directly stimulated cells subserving the rewarding effect are non-dopaminergic neurons with myelinated axons (Shizgal, 1997;Bielajew & Shizgal, 1986;Gallistel, Yeomans & Shizgal, 1981;Bielajew & Shizgal, 1982;Shizgal et al., 1980).
In addition to finding firing frequencies corresponding to a given pulse frequency, we will also need to do the reverse: finding the pulse frequency required to produce a given firing frequency. Thus, we define a backsolution of the frequency-following function: [eqn_num, eqn_tab] = add_eqn(eqn_num, eqn_tab, "fFbacksolved", "Back-solution of the fr This is equation #2: (fFbacksolved)

The burst-duration function
For completeness, we include a function that maps the duration of the pulse train into the duration of the evoked increase in the firing of the first-stage neurons. In practice, we assume that the two are equal, as will be case when frequency-following fidelity is high.

The subjective-probability function
This is another dummy function. As we review below, the subjective probability that a reward will be delivered upon satisfaction of the response requirement appears to equal the objective probability when the latter is 0.5 or higher. In the case of the present oICSS experiment, the objective probability of reward is always one.
n.b. A lower-case "p" is employed in the symbols.

The subjective-price (opportunity-cost) function
As described in Solomon et al., (2017), the subjective-price function is defined as In addition to finding subjective prices corresponding to objective ones, we will also need to do the reverse. Thus, we define a back-solution of the subjective-price function: [eqn_num, eqn_tab] = add_eqn(eqn_num, eqn_tab, "fPbacksolved", "Back-solution of the su This is equation #7: (fPbacksolved)

The subjective effort-cost function
The physical work entailed in holding down the lever is transformed into the subjective rate of exertion by the following function: We do not attempt to model the form and parameters of this function. The dots over and signify that we define these quantities as rates.
These graphs shows how the steepness parameter rotates the reward-growth functions around their midpoint. These graphs show how the parameter, , rescales the output of the reward-growth function vertically, thus determining its asymptotic level.
The preceding graphs show how rescaling of the input to and output from the reward-growth function produce orthogonal changes. This is true as long as as the firing of the directly stimulated neurons keeps pace with the pulse frequency. However, once the value of the position parameter nears the maximum firing frequency of the directly stimulated neurons (e.g., when the current is very low), rightward shifts of the reward-growth function, such as those produced by further decreases in current, are accompanied by decreases in its upper asymptote. The neurons can no longer fire fast enough to drive reward intensity to the same maximum as was achieved when the value of the position parameter was lower (e.g., because the current was higher). This has implications for the interpretation of changes in the location parameters of the reward mountain, as discussed below. When the middle of the frequency roll-off zone ( ) is positioned at 362 pulses (the median value reported by Solomon et al., 2015) and is 100 pulses , the normalized reward-growth function approaches an upper asymptote of one, as expected. However, doubling to ~200 pulses moderately decreases the upper asymptote. A further doubling to ~400 pulses pushes past , markedly truncating the growth of reward intensity. close all;

The location parameter of the reward-growth function for BSR
The location parameter of the reward-growth function is the firing rate that drives reward intensity to half its maximal value. The value of this parameter depends on the number of stimulated first-stage neurons, N, and the interval during which the stimulation train elevates their firing rate, .
We assume that .
The counter model is implicit in the equation for . According to this model, the aggregate rate of firing in the directly stimulated neurons determines the intensity of the rewarding effect.

The behavioral-allocation function
The payoffs from pursuit of BSR and engagement in leisure activities are used by the sole core→shell function to compute the allocation of time to pursuit of BSR.
Restating this equation symbolically, we obtain When the reward intensity produced by the stimulation approaches its upper asymptote, the price at which is: [eqn_num, eqn_tab] = ... add_eqn(eqn_num, eqn_tab, "Psub_e", "Definition of Psub_e"); We can now define the equation for normalized time allocation as: defines the surface of the reward-mountain model in a three-dimensional space defined by the two independent variables, the pulse frequency and the price , and a single dependent variable, time allocation .
Thus, this equation bridges the core of the model by linking three shell variables, two inputs and one output; the bridge is constructed from the shell → core, core, and core → shell functions. The slope of the mountain surface depends on the payoff-sensitivity parameter and the steepness parameter of the reward-growth equation, . The location of the mountain in the plane defined by the independent variables is determined by the two location parameters: whereas positions the mountain along the price axis, positions the mountain along the pulse-frequency axis. Finally, the altitude of the "valley floor" and "summit" are set by and , respectively.
To simulate the mountain model, we must obtain values for the position parameters. . We proceed as follows: [fun_num, fun_tab] = add_fun(fun_num, fun_tab, "FpulseHMfun", ... "C, D, Fbend, FPmax, Fro, N, RhoPi, varargin",... "Function to compute the pulse frequency that produces half-maximal reward intensit This is function #30: (FpulseHMfun) The parameter that determines the location of the mountain along the price axis is . Before computing , we must determine the maximum value of the normalized reward intensity ( ).
If the value of the position parameter of the reward-growth function is too high given the frequency roll-off parameters, will be less than one.

Moving the mountain: validation studies
tic; close all; We have carried out a series of experiments that test the mountain model. Given that there are two independent position parameters , it should be possible to move the mountain independently along either of the axes representing the objective price and the pulse frequency. This prediction has been tested in three ways.

(See Functions that plot graphs and set attributes below.)
Although the change in the position of the mountain following and increase in the number of reward-related neurons recruited can be discerned readily in the 3D plots, the effect is best depicted by means of contour graphs and a bar graph showing the changes in the position paramter. The contour graphs capture in two dimensions all of the spatial information in the 3D plots.
ContLoI = plot_contour (Felec,Pobj,Tmfb1,PobjE(1), FmfbHM (1), 'off', 'ContLoI', 'low strcat({'N = '}, num2str(N(1))), graphs2files, FigDir); ContHiI = plot_contour (Felec,Pobj,Tmfb2,PobjE(2), FmfbHM (2) [fun_num, fun_tab] = add_fun(fun_num, fun_tab, "quad_subplot", ... "cont1, cont2, bg, quad_sub_out, bg_root, graphs2files, figdir",... "Function to plot four graphs in a 2 x 2 mosaic"); This is function #33: (quad_subplot) The contour graph for the low-current condition is shown twice, once in the upper-left panel and once in the lower-right panel. By comparing the horizontal position of the mountain in the left column, the reader can quickly discern whether there has been a shift along the price axis, and by comparing the vertical position of the mountains in the bottom row, the reader can quickly discern whether there has been a shift along the pulse-frequency axis. As required by the mountain model, the latter shift is observed when the number of neurons recruited is increased due to boost in the stimulation current. The bar graph in the upper right provides a summary of the simulated shifts. The tiny rightward shift along the price axis is due to the fact that is a little closer to one at the higher current than at the lower current (due to the fact that is lower). Arvanitogiannis and Shizgal (2008) tested the effect of varying the stimulation current on the position of the reward mountain. In all four rats, the mountain shifted along the pulse-frequency axis as predicted. In two of these rats, there was no corresponding shift along the price axis, but rightward shifts were seen in the remaining two subjects. This deviation from the predictions was attributed to the heterogeneity of the stimulated neurons. Such a deviation would be expected if the stimulation activated two subpopulations of reward-related neurons that project to separate integrators with converging outputs (Arvanitogiannis, Waraczynski & Shizgal, 1996;Arvanitogiannis & Shizgal, 2008). The convergence model described below provides an example of such an arrangement.
PobjE (2), PsubBend, PsubMin, RnormMax (2)); MTNloD = plot_MTN (Felec,Pobj,Tmfb1,'off','MTNloD',... graphs2files,FigDir); MTNhiD = plot_MTN (Felec,Pobj,Tmfb2,'off','MTNhiD', (1))), graphs2files, FigDir); ContHiD = plot_contour (Felec,Pobj,Tmfb2,PobjE(2), FmfbHM (2) tic; close all; Arvanitogiannis and Shizgal (2008) assessed the effect of increasing the train duration from 0.25 to 1.00 s in four rats. The data in all four cases correspond to the prediction: there were statistically reliable shifts of the reward mountain along the pulse-frequency axis but not along the price axis. The effect of increasing the train duration was tested in an additional six rats by Breton et al. (2014). In all six cases, the mountain shifted as predicted along the pulse-frequency axis, and in three of these cases, no reliable shifts along the price axis were observed. In the remaining three cases, increasing the train duration did produce rightward shifts along the price axis. These were again hypothesized to due to the recruitment of reward-related neurons projecting to separate integrators. The convergence model describe below offers a new interpretations of the rightward shifts.

Effect of varying reward probability
tic; close all; Whereas changing the current or train duration is predicted to shift the reward mountain along the pulsefrequency axis but not along the price axis, changing the probability of delivering a reward upon satisfaction of the response requirement is predicted to produce an orthogonal shift: the mounain should move along the price axis but not along the pulse-frequency axis. This can be seen readily by inspection of Intuitively, changing the reward probability should have no effect on the pulse-frequency required to produce a reward of half-maximal intensity, but is should rescale the payoff produced by this reward. Following the change in reward probability, the reward produced by a given pulse frequency is as intense as it was previously, but the ability of this pulse train to compete with alternate sources of reward will depend on the likelihood that the rat gets paid for the work it performs to obtain the electrical stimulation.
We will simulate the effect of changing the reward probability, first from 1.0 to 0.75 and then from 1.0 to 0.5:  (1) (1), PsubBend, PsubMin, RnormMax); Tmfb2 = TAfun(a, Felec,FelecBend,FmfbHM,FelecRO,g,Pobj,... PobjE(2), PsubBend, PsubMin, RnormMax); MTNp1 = plot_MTN (Felec,Pobj,Tmfb1,'off','MTNp1','p = 1.0',... graphs2files,FigDir); MTNp0p5 = plot_MTN (Felec,Pobj,Tmfb2,'off','MTNp0p5','p = 0.5',...  Breton, Conover and Shizgal (2014) tested the effect of decreasing the reward probability from 1.00 to 0.75. The reward mountains obtained from all 10 rats shifted along the price axis as predicted. There was no statistically reliable shift along the pulse-frequency axis in the position of the mountains obtained from six of these rats. In the remaining four cases, the shifts along the pulse-frequency axis were small in comparison to the shifts along the price axis and were inconsistent in direction. Decreasing the probability of reward further to 0.5, increased the size of the shifts along the price axis in all seven rats tested. In five cases, there were no shifts observed along the pulse-frequency axis, and in the two cases in which such shifts were detected, they were inconsistent in direction. Data from one subject showing the effect of reducing the reward probability to 0.5 are shown below

The significance of orthogonal shifts
tic; close all; The validation experiments just described show conclusively that the reward mountain can be displaced along either the pulse-frequency or price axes, as predicted by the underlying model. In most cases, these displacements are orthogonal: the mountain shifts either along one axis or the other.
These observations have important implications for the form of the reward-growth function at the heart of the mountain model. In order for shifts along more than one axis to be possible, the form of the reward-growth function must distinguish changes that rescale its input and output. What would happen if frequency-following fidelity remained the same, but the input-scaling and output-scaling parameters of the reward-growth function were no longer independent? We can simulate such a case by replacing the logistic reward-growth function with a power function: In contrast to the case of logistic growth, the two scaling constants act jointly, in an inseparable manner, and exclusively on the output of the function. Changing either scaling constant shifts the power-growth function along the y axis but does not change its position along the x axis. As in the case of the logistic reward-growth function, g determines the steepness of reward growth (the slope on double-logarithmic coordinates). Although appears to scale the output, whereas appears to scale the induced firing frequency, shows that the two constants function as one. Thus, reward intensity will grow as a power function of the electrical pulse frequency until the induced firing frequency in the stimulation reward-related neurons asymptotes. Changing the current displaces the logistic reward-growth function laterally, a shift that is orthogonal to the one produced by changing the scale parameter, . In contrast, changing the current displaces the power reward-growth function vertically, the same direction as the shift produced by changing the scale parameter. This is shown graphically below.
In the case of eICSS, we know from the experiments of Gallistel's group that reward intensity grows in a manner similar to a logistic function of pulse frequency. Unlike the case of power growth, logistic growth is characterized by independent parameters that scale the input and output. This characteristic is what causes the effect of varying reward probability to shift the reward mountain in a direction orthogonal to the effects of varying current or train duration. It follows that when such orthogonal shifts are observed, this requires that the underlying reward-growth function incorporates independent parameters that scale the input (the position parameter) and the output (the parameter representing the maximum reward intensity). Thus even in the absence of data from matching experiments or other methods for measuring the growth of reward intensity, we can make inferences about the reward-growth function by observing the way in which various manipulations shift the reward mountain. This will be important for the interpretation of the oICSS data.

At what stage of processing does perturbation of dopaminergic neurotransmission alter reward seeking in eICSS?
tic; close all; We have described changes in the position parameter as alterations in the sensitivity of the rewardgrowth function. One might best label as the "inverse-sensitivity" parameter because the lower its value, the weaker the input required to drive reward intensity to a given level. In contrast, we have described changes in the output-scaling parameter as alterations in the gain of the reward-growth function. Just as turning the volume knob on an audio amplifier changes the loudness produced by weak and strong inputs by the same percentage, changing alters the reward intensity produced by both low and high pulse frequencies (within the frequency-following range) by the same percentage. Changes in sensitivity shift the reward mountain along the pulse-frequency axis, whereas changes in gain shift the mountain along the price axis.
Operant peformance in the eICSS paradigm is altered by perturbation of dopaminergic neurotransmission. Boosting dopaminergic signalling typically enhances performance, whereas attenuating dopaminergic signalling typically attenuates performance. These changes have long been attributed to the modulation of reward sensitivity by dopaminergic agents (See: Hernandez et al., 2010). If so, drugs that alter dopaminergic neurotransmission should shift the reward mountain along the pulse-frequency axis and not along the price axis.
We have shown that this is not so. In rats working for electrical stimulation of the medial forebrain bundle, we demonstrated that enhancement of dopaminergic signalling by the highly specific reuptake blocker, GBR-12909, shifted the reward mountain rightward along the price axis in 8/10 rats without producing any statistically reliable shifts along the pulse-frequency axis (Hernandez et al., 2012). The D2/D4/5HT7 receptor blocker, pimozide, shifted the reward mountain leftward along the price axis in 5/6 rats without producing any statistically reliable shifts along the pulse-frequency axis (Trujillo-Pisanty, Conover & Shizgal, 2014). To account for these complementary effects, the changes in dopaminergic neurotransmission had to have altered one or more of the variables that determine the position of the reward mountain along the price axis, such as the output-scaling parameter of the reward-growth function , the subjective effort cost ( ), or the payoff from alternate activities ( ). An intuitively appealing interpretation of these data is based on the notion that the neurons subserving eICSS of the medial forebrain bundle project to midbrain dopamine neurons and that it is the transynaptic activation of the dopamine neurons that gives rise to the reward effect of the medial forebrain bundle stimulation (the "series" model of brain-reward circuitry). In the following section, we apply the reward mountain model to the oICSS data from the present paper. We show that the series model cannot account for the data from both the eICSS and oICSS studies. After demonstrating this, we discuss "convergence" models that show promise for an intergrated account of both eICSS and oICSS, and we propose experiments that put such accounts to empirical test.
toc Elapsed time is 0.748551 seconds.

Optical intracranial self-stimulation of midbrain dopamine neurons
tic; close all; In the experiments reviewed above, rats worked to trigger trains of electrical current pulses applied to the medial forebrain bundle. In contrast, in the experiment we now discuss, rats worked to trigger trains of optical pulses delivered through an optical fiber positioned over the ventral tegmental area of the midbrain. As a result of viral transcription and Cre-Lox recombination, dopamine neurons with somata in, or adjacent to, the VTA expressed channelrhodopsin-2 (ChR2) and thus could be excited by light delivered at an appropriate wavelength (473 nm). As others have reported previously (Witten et al., 2012), rats expressing ChR2 in midbrain dopamine neurons work vigorously to obtain such optical stimulation.
We show that TH-Cre rats expressing ChR2 in midbrain dopamine neurons learn to perform the cumulative hold-down task to receive optical stimulation of the ventral tegmental area (VTA), where the somata of dopamine neurons projecting to forebrain targets reside. As in the case of eICSS, time allocation varies smoothly as a function of the strength and price of the optical stimulation. The surface defined by the mountain model fits the data well.
To boost dopaminergic neurotransmission, we administered 20 mg/kg of the dopamine-transporter blocker, GBR-12909. Under the influence of the drug, the mountain fitted to the data from rat BeChR29 shifted leftwards along the pulse-frequency axis and rightwards along the price axis.
[fig_num, fig_tab] = add_fig(fig_num, fig_tab, "BeChR29vehdrgMtn", ... "Effect of GBR-12909 on the reward-mountain data obtained from rat BeChR29"); This is figure #30: (BeChR29vehdrgMtn) These shifts can be most clearly discerned in the contour-plot display and bargraph summary: The error bars surrounding the data point for each rat represent the bootstrapped 95% confidence interval (CI) for the parameter in question, whereas the boxes denoted by the heavy solid lines represent the inter-quartile range (IQR) of the median parameter estimate for the group of seven rats. The and values have been corrected for differential frequency-following fidelity in the vehicle and drug conditions. (See below and main body of the manuscript.)

Modeling the results
We now undertake modeling to determine how the optical pulse train can be translated into the observed behavioral responses in a manner consistent with the observed effect of dopamine-transporter blockade on the reward mountain.
The first step is to relate the induced frequency of firing in the dopamine neurons to the optical pulse frequency. For this purpose, we adopt a filter function of the form defined by disp(string({strcat({'Function '},num2str(fun_tab.Number(fun_tab.Name=='FilterFun')))}) Function 6 , which provides an accurate description of how medial forebrain bundle fibers subserving electrical intracranial self-stimulation respond to the electrical pulse frequency. It is not known how well the functional form describes frequency-following in optically activated, ChR2-expressing, midbrain dopamine neurons. As explained in the main body of the manuscript, parameter values have been chosen on the basis of published electrophysiological, voltammetric, and behavioral data.
The following graph show the function employed to model frequency-following fidelity in the ChR2-expressing dopamine neurons in response to the optical pulse frequency:

The spike counter
Performance for optical stimulation of midbrain dopamine neurons depends both on the induced firing frequency and the number of dopamine neurons excited by the optical input (Ilango et al., 2014). The simplest assumption consistent with this finding is that the behavioral effects depend on the aggregate rate of induced firing, as is the case of the behavioral effects produced by rewarding electrical stimulation of the medial forebrain bundle. The aggregate rate of firing induced by the optical stimulation in the population of dopamine neurons is the product of the number of optically activated neurons and the induced firing frequency:

The effect on the reward mountain produced by modulation of dopaminergic neurotransmission
tic; close all; The influence of the induced firing of the dopamine neurons can be modulated by drugs, such as transporter blockers, that alter synaptic transmission. To capture such effects, we add a scalar at the output of the dopamine spike count: [eqn_num, eqn_tab] = ... add_eqn(eqn_num, eqn_tab, "DAdrive", "scaled post-synaptic influence of the induced This is equation #31: (DAdrive) [sym_num, sym_tab] = add_sym(sym_num, sym_tab, "Kda", "constant that scales the postsyn Symbol Kda has already been entered.
In the following schema, the aggregate rate of firing is represented by the Π symbol and the dopamine-drive scalar by a triangle: We refer to the scaled output of the activated population of midbrain dopamine neurons as "dopamine drive."

The reward-growth function for oICSS
In six of the seven rats, dopamine-transporter blockade shifted the mountain leftward along the pulse-frequency axis. As we demonstrated above, such shifts require that the function that translates dopamine drive into reward intensity (i.e., the reward-growth function) must have a position parameter that is independent of the parameter that sets the maximum reward intensity attainable. The logistic function described by Gallistel's group in the case of eICSS has this property, and we have therefore added such a function at the output of the counter. In the case of eICSS, the form of the two-dimensional reward-growth function employed in the simulations is based on the empirical measurements obtained by Gallistel's group. No such measurements have yet been obtained for oICSS. Future experiments will be required to determine whether the logistic form or another form with independent input-and output-scaling parameters provides the best description of reward growth in the case of oICSS.
In the case of eICSS, the reward-growth function has been generalized to three dimensions {reward intensity, pulse frequency, train duration} on the basis of additional empirical data (Sonnenschein, Conover, & Shizgal, 2003). Both the pulse frequency and train duration have been manipulated in oICSS experiments (Ilango et al., 2014), but not in a way that makes it possible to describe the joint dependence of reward intensity on these two variables. Future experiments will be required to determine this. For present purposes, we will again adopt the function derived from eICSS data for use in simulating oICSS performance.
To compute time allocation, we also require the value of the parameter that positions the mountain along the price axis, . In the initial simulation, we assume that the value of this parameter is unaffected by the blockade of the dopamine transporter.

% PobjE is a two-element vector
We can now estimate time allocation and simulate the reward mountains for the vehicle and drug conditions using the full logistic reward-growth model shown in the figure below:  (1)); Tda2 = TAfun (a,Fopt,FdaBend,FdaHMkDA(2),FdaRO,g,Pobj,PobjE(2), ... PsubBend, PsubMin, RnormMax (2)); Before plotting the bar graphs, we need to correct the estimates of the location parameters for the differential frequency-following fidelity in the drug and vehicle conditions. The mountain is shifted downwards along the pulse-frequency axis in the drug condition and thus, frequency-following fidelity is better than in the vehicle condition.
As we explain in the main body of the manuscript,  (2))); MTNkDA2 = plot_MTN (Fopt,Pobj,Tda2,'off','MTNkDA2',title_str2,... graphs2files,FigDir,xmin,xmax,ymin,ymax In the above simulation, the drug-induced boost in the effectiveness of dopaminergic neurotransmission has rescaled the input to the reward-growth function. As a result, the mountain shifts leftwards along the pulsefrequency axis. This captures one of the observed effects of GBR-12909.
The small shift along the price axis is due to the fact that is closer to one in the drug condition than in the vehicle condition. This is due to the fact that is lower in the drug condition and thus further from the pulse frequency at which frequency following begins to roll off. The corrected value of is designated by the dot-dash cyan line on the red bar.
The difference in frequency-following fidelity in the drug and vehicle conditions also produces a small shift in . This shift is removed by the correction (dot-dash cyan line on the blue bar).
The empricially observed rightward shifts along the pulse axis survive the correction for differential frequencyfollowing fidelity. To capture these, a scalar increase must be produced at or beyond the output of the rewardgrowth function. In the example shown below, we do this by having the drug-induced increase in dopamine tone reduce the subjective effort cost, as proposed by Salomone and colleagues.
Here are the shifts observed in the mountains obtained from rat BeChR29 and the shifts simulated above using the and values derived from that rat's data: In the previous section, we demonstrate how the effects of dopamine-transporter blockade on oICSS can be explained within the context of the mountain model by a combination of changes in phasic and tonic dopamine signaling. In this section, we pose an additional challenge: can the explanation proposed for the oICSS data also account for the known effects of dopamine-transporter blockade on eICSS? Below, we show that the proposed explanation fails to meet this challenge, we discuss the implications for the "series-circuit" model of eICSS, and we develop a new model that can account for the effects of dopamine-transporter blockade on both oICSS and eICSS.
The series-circuit model treats oICSS and eICSS as behavioral manifestations of the effects produced by injecting signals at two different neural stages of the same pathway. On this view, eICSS arises from electrically induced activation of highly excitable, non-dopaminergic neurons that project directly or indirectly to midbrain dopamine neurons. In other words, the directly activated neurons that give rise to eICSS are in series with the dopamine neurons that render the electrical stimulation rewarding. The midbrain dopamine neurons are excited trans-synaptically in the case of eICSS and directly in the case of oICSS. The consequences of their activation are the same: the subject seeks to re-initiate the stimulation and will pay large effort and opportunity costs when the strength of the stimulation suffices to produce a large increment in the aggregate firing rate of the dopamine neurons.
Here, we show only the portion of the model that generates the reward-intensity signal.
[ That figure shows that modulation of dopaminergic neurotransmission alters eICSS performance by shifting the reward mountain along the price axis and not along the pulse-frequency axis. This implies that the drug-induced change in dopamine signaling acts at or beyond the output of the reward-growth function for eICSS. For this to occur in the series-circuit model, the reward-growth function for eICSS must lie upstream (to the left) of the dopamine neurons.
A second reward-growth function must lie downstream (to the right) of the dopamine neurons. This is required because GBR-12909 shifted the reward mountain for oICSS along the pulse-frequency axis, as shown in Name=='GBRshiftSummary')))})); 32 We explain above (The significance of orthogonal shifts) that the mountain will move along the pulsefrequency axis only if the input to the reward-growth function has been rescaled. Such rescaling occurs when the magnitude of dopamine transients is boosted by GBR-12909 and the reward-growth function for oICSS is positioned downstream of the dopamine neurons.

Series-circuit model: changes in the reward mountain for eICSS in response to dopamine-transporter blockade
To generate a reward mountain for eICSS from the series-circuit model, we proceed in several stages. As was done above (The reward-growth function for oICSS), the optical pulse frequency required to produce a halfmaximal reward intensity is calculated directly from disp (string({strcat({'Equation '},num2str(eqn_tab.Number(eqn_tab.Name=='fH')))})); Equation 17 We next need to compute the electrical pulse frequency (applied to a medial forebrain bundle electrode) that produces excitation in the dopamine neurons equivalent to that produced by a given train of optical pulses delivered to the midbrain dopamine neurons. According the series-circuit model, all inputs that produce the same peak output from the dopamine neurons will produce the same rewarding effect. This will be true regardless of whether the dopamine neurons are excited directly by optical activation or indirectly by transsynaptic input from medial forebrain bundle neurons activated by electrical stimulation. According to the model, an observer placed downstream from the dopamine neurons and supplied only with information about the aggregate peak output of these neurons cannot know whether optical or electrical stimulation was responsible for a given phasic increase in dopamine release. It is the peak magnitude of this phasic increase in aggregate firing that determines the intensity of the rewarding effect.
(See "The spike counter" above.) To obtain the electrical pulse frequency required to produce a reward of half-maximal intensity, we need to back-solve the equations describing the stages of the model that intervene between the electrode and the dopamine neurons. The back-solutions return the electrical pulse frequency that delivers an input to dopamine neurons equivalent to the optical pulse frequency that produces a half-maximal reward intensity.
The reward-growth functions (S-shaped curves in rectangular boxes) in the flow diagrams are normalized: Their output varies from zero to one and is then scaled by the variable in the triangle to their right. In the case of the upstream reward-growth function in (the reward-growth function to the left of the dopamine neurons), that scaling variable is . The value of this variable determines the maximum input that the electrode can deliver to the dopamine neurons, scaled in terms of the equivalent optical pulse frequency. In the initial simulation below, is set to 63 pulses per second, and given the parameters of the frequency-following function of the dopamine neurons { }, this will drive the dopamine neurons to a near-maximal level (~43 spikes ). We demonstrate below that the qualitative effect of dopamine transporter blockade is little affected by the value of .
The function that computes the required electrical pulse frequency in the series-circuit model, FscHMbs, comprises three back-solutions. (See Functions composing the reward-mountain model below.) First, we invert the scaling of the output of the upstream reward-growth function: we divide the optical pulse frequency required to produce a reward of half-maximal intensity by . This gives us the output of the upstream normalized reward-growth function (a value between zero and one), the fraction of required to produce a half-maximal reward intensity at the output of the dopamine neurons (the FdaHMkDA value). We then backsolve the upstream reward-growth function (by means of the LogistNormBsFun function) to obtain the average firing rate of the medial forebrain bundle neurons required to produce the reward intensity in question, given the position parameter of the logistic reward-growth function and the value of its exponent. The position parameter of the logistic is calculated using disp (string({strcat({'Equation '},num2str(eqn_tab.Number(eqn_tab.Name=='fH')))})); Equation 17 (the FFmfbHM function). Finally, we use the FilterFunBS function to obtain the electrical pulse frequency that produces this average firing rate.
The input to FscHMbs is the optical pulse frequency required to produce a reward of half-maximal intensity ( ) by direct activation of the dopamine neurons, and the output is the equivalent electrical pulse frequency ( ).
[fun_num, fun_tab] = add_fun(fun_num, fun_tab, "FscHMbs", ... RhoPiMFB = 5000; % RhoPi/N is within ~20% of 50, which is roughly consistent with Sonne FPmax = 1000; FmfbHM = FpulseHMfun (C, D, FmfbBend, FPmax, FmfbRO, NnMFB, RhoPiMFB) FmfbHM = 77.2222 In the vehicle condition of the current experiment, the median value for oICSS was 27.1. The train duration was 1 s, whereas it was 0.5 s in the corresponding eICSS study. The form and parameters of the temporal-integration function for oICSS of midbrain dopamine neurons are unknown. Faut de mieux, we will use the functional form and parameters obtained from eICSS to compute the Fhm value for a 0.5 s train that corresponds to the value obtained in the current oICSS study for 1 s trains.
In the series-circuit model, the output of the MFB neurons must pass through the midbrain dopamine neurons. If so, the temporal-integration characteristics obtained in eICSS experiments reflect those of both the directly stimulated and dopamine stages of the circuit. According to this model, integration in the dopamine stage cannot be faster than estimated in the eICSS study.
It follows from disp(string({strcat({'Equation '},num2str(eqn_tab.Number(eqn_tab.Name=='fH')))})); Equation 17 that the Fhm value for a 0.5 s train that corresponds to an Fhm value for a 1 s train is given by The MFB will have to deliver excitation equivalent to to deliver a half-maximal reward intensity from the dopamine neurons at a train duration of 0.5 s. We will determine the values of and so as to obtain the above value in the vehicle condition.
We first set the number of activated dopamine neurons arbitrarily to 100. equal to one, so the equation is simplified to: RhoPiDA = NnDA * (FfiringHmD2/(1+(C/D2))) RhoPiDA = 1.5715e+03 We are now positioned to compute the MFB pulse frequency that will drive the dopamine neurons to produce a reward of half-maximal intensity. Now, we work backwards through the MFB input to find the electrical pulse frequency that will drive the dopamine neurons to produce a reward of half-maximal intensity. To solve the time-allocation equation for the series-circuit model, we must first compute the drive on the dopamine neurons that is produced by each value of the electrical pulse frequency ( ). We express this drive in terms of the optical pulse frequency that produces equivalent firing in the dopamine neurons, .
The FoptEquivFun function first translates the electrical pulse frequency into the induced rate of firing in the medial forebrain bundle neurons, then translates this firing rate into a normalized reward intensity, and then converts the normalized reward intensity into the equivalent optical pulse frequency. Once this has been done, time allocation can be computed using the same equation and parameter values that were used to generate the reward mountain for optical stimulation. The FoptEquivFun function is the inverse of the FscHMbs function.
An observer positioned at the input to the downstream reward-growth function cannot know whether optical or electrical stimulation was responsible for the phasic dopamine signal that constitutes the input to this function. Thus, the series-circuit model cannot readily generate differential predictions in response to optical and electrical inputs.
toc Elapsed time is 1.282094 seconds.

The qualitative predictions don't depend meaningfully on the value of the parameter than scales the medial forebrain bundle drive on the dopamine neurons.
tic; clear -regexp ^GBR; if show_graphics close all; end The next set of graphs shows that the predictions don't change qualitatively when the maximum medial forebrain bundle drive on the dopamine neurons is reduced: The mountain continues to shift along both the price and pulse-frequency axes KrgEqLo = 41; % Maximum MFB drive is reduced % n.b. KrgEqLo must be sufficiently exceed FdaHMkDA so as to keep FscHMlo below the max % Otherwise the back-solution function will generate an error message and return.  (1)); Tsc4 = TAfun(a, FoptEquivLo, FdaBend, FdaHMkDA (2), FdaRO, gDA, Pobj, ...
The problem with the series-circuit model is a fundamental one. It predicts shifts of the mountain along the pulse-frequency axis in response to perturbation of dopaminergic neurotransmission. In contrast, systematic, consistent shifts along the pulse-frequency axis are not seen in eICSS studies under the influence of the dopamine transporter blocker, GBR-12909 (Hernandez et al. 2012); the dopamine, norepinephrine, and serotonin blocker, cocaine (Hernandez et al., 2010); the D2, D3 and 5HT7 receptor blocker, pimozide ( The failure of the series-circuit model to account readily for the differential movement of the reward mountain in the eICSS and oICSS studies motivates the search for an alternative. We investigate a new model here.
The challenge is to account for the stability of the reward mountain along the pulse-frequency axis under dopaminergic challenge in the eICSS data, the observed displacement along the pulse-frequency axis in the oICSS data, and the observed displacement along the price axis in both datasets.
[fig_num, fig_tab] = add_fig(fig_num, fig_tab, "Convergence_model", ... "A model implementing converging pathways subserving oICSS and eICSS"); This is figure #48: (Convergence_model) In this model, the circuitry underlying eICSS of the medial forebrain bundle and oICSS of midbrain dopamine neurons includes parallel stages that are linked at two levels. The parallel limbs subserve oICSS and eICSS between the directly stimulated neurons and the scaled output of the reward-growth functions. The upstream link between the two limbs relays to midbrain dopamine neurons input from neurons activated by electrical stimulation of the medial forebrain bundle. The downsteam link combines the outputs of the two limbs so as to produce the signal representing the benefit of the experimenter-controlled reward.

The input from the MFB to the midbrain dopamine neurons
It has long been known that electrical stimulation of the medial forebrain bundle provides trans-synaptic input to midbrain dopamine neurons (Maeda & Mogenson, 1980) and drives phasic dopamine release from their ventral striatal terminals (Gratton, Hoffer & Gerhardt, 1988;Yavich & Tiihonen, 2000a, 2000bWightman & Robinson, 2002;Yavitch & Tanila, 2007). Recently, Cossette, Conover & Shizgal (2016) demonstrated differences between the frequency-following characteristics of the MFB input to midbrain dopamine neurons that project to the medial shell of the nucleus accumbens and the frequency-following characteristics of the neurons subserving eICSS of the MFB. Whereas the neurons subserving the rewarding effect maintain highfidelity frequency following up to ~360 pulses (Solomon et al., 2015), the input to the nucleus-accumbens projecting dopamine neurons fails to follow pulse frequencies greater than 130 pulses . To accommodate this difference, a second low-pass filter is inserted below the link between the medial forebrain bundle fibers and the spike counter to the left of the dopamine neurons in At pulse frequencies below 130 pulses , the amplitude of dopamine transients recorded in the nucleus accumbens in response to electrical stimulation of the medial forebrain bundle grows as function of both current and pulse frequency (Cossette, Conover & Shizgal, 2016). Increases in current can compensate for decreases in pulse frequency so as to hold constant the amplitude of the transient (and vice-versa). This suggests that with train duration held constant, the amplitude of the transient depends on the aggregate rate of firing in the medial forebrain bundle fibers that drive the activation of the dopamine neurons. That is the reason for the second spike counter.
As in the case of the series-circuit model, the trans-synaptic drive on the dopamine neurons is represented in terms of the optical pulse frequency that provides equivalent dopaminergic activation.

Summation between the outputs of the two parallel limbs
The convergence model retains the idea that there is a final common path for neural signals that encode the predicted benefits of reward procurement. The two parallel circuit limbs, one subserving eICSS and the other oICSS, converge on this final common path. A simple way to implement this convergence is to add the outputs of the two limbs. This proposal faces a seemingly daunting challenge: electrical stimulation of the MFB activates midbrain dopamine neurons. If so, one would expect such stimulation to drive signaling in both of the hypothesized converging pathways. Wouldn't this produce at least some displacement of the reward mountain along the pulse-frequency axis in response to dopamine-transporter blockade? We show below that this is not necessarily the case. Indeed, the simulations show that given reasonable assumptions and values drawn from the current data, a convergence model can replicate the eICSS findings.

Function 34
The reward-intensity signal at the output of the oICSS limb of the circuit is computed in the same manner as when the input consists of optical stimulation pulses. However, additional steps are required to compute the output of this limb during eICSS, when electrically excited medial forebrain bundle neurons provide the input to the dopamine neurons. The electrical pulse frequency is translated in the input to the dopamine neurons (expressed in terms of the equivalent optical pulse frequency) by the function, FmfbDAdriveFun.  (Felec, FelecBend, FhmUpper, FelecRO, gElec, KrgUpper);  In the following graph, we see how reward-intensity grows in the lower limb as a function of the electrical pulse frequency.
The graph above shows that strong input is required to generate a substantial response from the lower limb.
The reason for this is that the value is fairly close to the assumed frequency-following limit of the dopamine cells. The parameter values used in this simulation yield a value close to the observed median in the current study when the train duration was 1 s. An adjustment is then made to predict model output for a train duration of 0.5 s, which was used in the eICSS work. This calculation are described above (see: The location parameter of the reward-growth function for BSR).
Growth curves for the summated reward intensity are shown here, at different values of the strength of the MFB drive on the dopamine neurons. The equivalent optical pulse frequencies (FoptEquiv) are the maximum values that the simulated MFB drive can achieve.
Adding dopaminergic modulation of effort cost (i.e., a drug-induced reduction in these costs) boosts the shift along the price axis. The same effect could be produced by boosting the output of the reward-growth function(s) or decreasing the value of alternate activities.

Remarks concerning the convergence model
The results in the influence of GBR-12901 (Hernandez et al., 2012). Thus, given moderate MFB drive and values compatible with those obtained in the present study (adjusted for a train duration of 0.5 s), the convergence model can generate outputs that match our earlier eICSS results.