Quantum Models for Psychological Measurements: An Unsolved Problem

There has been a strong recent interest in applying quantum theory (QT) outside physics, including in cognitive science. We analyze the applicability of QT to two basic properties in opinion polling. The first property (response replicability) is that, for a large class of questions, a response to a given question is expected to be repeated if the question is posed again, irrespective of whether another question is asked and answered in between. The second property (question order effect) is that the response probabilities frequently depend on the order in which the questions are asked. Whenever these two properties occur together, it poses a problem for QT. The conventional QT with Hermitian operators can handle response replicability, but only in the way incompatible with the question order effect. In the generalization of QT known as theory of positive-operator-valued measures (POVMs), in order to account for response replicability, the POVMs involved must be conventional operators. Although these problems are not unique to QT and also challenge conventional cognitive theories, they stand out as important unresolved problems for the application of QT to cognition. Either some new principles are needed to determine the bounds of applicability of QT to cognition, or quantum formalisms more general than POVMs are needed.


Introduction
Quantum theory (QT) is the mathematical formalism of quantum physics. (Sometimes the two are considered synonymous, in which case what we call here QT would have to be called ''mathematical formalism of QT.'') However, QT has recently begun to be used in various domains outside of physics, in biology, economics, and cognitive science (see Text S1 Representative Bibliography). For overviews, see the recently published monographs [1] and [2], as well as the recent target article in Brain and Behavioral Sciences [3] with ensuing commentaries and rejoinders. There is one obvious similarity between cognitive science and quantum physics: both deal with observations that are fundamentally probabilistic. This similarity makes the use of QT in cognitive science plausible, as QT is specifically designed to deal with random variables. Here, we analyze the applicability of QT in opinion-polling, and compare it to psychophysical judgments.
On a very general level, QT accounts for the probability distributions of measurement results using two kinds of entities, called observables A and states y (of the system on which the measurements are made). Let us assume that measurements are performed in a series of consecutive trials numbered 1,2, . . .. In each trial t the experimenter decides what measurement to make (e.g., what question to ask), and this amounts to choosing an observable A. Despite its name, the latter is not observable per se, in the colloquial sense of the word, but it is associated with a certain set of values u A ð Þ, which are the possible results one can observe by measuring A. In a psychological experiment these are the responses that a participant is allowed to give, such as Yes and No.
The probabilities of these outcomes in trial t (conditioned on all the previous measurements and their outcomes) are computed as some function of the observable A and of the state y t ð Þ in which the system (a particle in quantum physics, or a participant in psychology) is at the beginning of trial t, Pr u A ð Þ~u in trial tDmeasurements in trials 1, . . . ,t{1 ½ This measurement changes the state of the system, so that at the end of trial t the state is y tz1 ð Þ , generally different from y t ð Þ . The change y t ð Þ ?y tz1 ð Þ depends on the observable A, the state y t ð Þ , and the value u A ð Þ observed in trial t, On this level of generality, a psychologist will easily recognize in (1)-(2) a probabilistic version of the time-honored Stimulus-Organism-Response (S-O-R) scheme for explaining behavior [4]. This scheme involves stimuli (corresponding to A), responses (corresponding to u), and internal states (corresponding to y). It does not matter whether one simply identifies A with a stimulus, or interprets A as a kind of internal representation thereof, while interpreting the stimulus itself as part of the measurement procedure (together with the instructions and experimental setup, that are usually fixed for the entire sequence of trials). What is important is that the stimulus determines the observable A uniquely, so that if the same stimulus is presented in two different trials t and t', one can assume that A is the same in both of them.
The state y tz1 ð Þ determined by (2) may remain unchanged between the response u terminating trial t and the presentation of (the stimulus corresponding to) the new observable that initiates trial tz1. In some applications this interval can indeed be negligibly small or even zero, but if it is not, one has to allow for the evolution of y tz1 ð Þ within it. In QT, the ''pure'' evolution of the state (assuming no intervening inter-trial inputs) is described by some function where D is the time interval between the recording of u in trial t and the observable in trial tz1. This scheme is somewhat simplistic: one could allow H to depend, in addition to the time interval D, on the observable A and the outcome u in trial t. We do not consider such complex inter-trial dynamics schemes in this paper.
The reason we single out opinion-polling and compare it to psychophyscis is that they exemplify two very different types of stimulus-response relations.
In a typical opinion-polling experiment, a group of participants is asked one question at a time, e.g., a = ''Is Bill Clinton honest and trustworthy?'' and b = ''Is Al Gore honest and trustworthy?'' [5]. The two questions, obviously, differ from each other in many respects, none of which has anything to do with their content: the words ''Clinton'' and ''Gore'' sound different, and the participants know many aspects in which Clinton and Gore differ, besides their honesty or dishonesty. Therefore, if a question, say, b, were presented to a participant more than once, she would normally recognize that it had already been asked, which in turn would compel her to repeat it, unless she wants to contradict herself. One can think of situations when the respondent can change her opinion, e.g., if another question posed between two replications of the question provides new information or reminds something forgotten. Thus, if the answer to the question a = ''Do you want to eat this chocolate bar?'' is Yes, and the second question is b = ''Do you want to lose weight?,'' the replications of a may very well elicit response No. It is even conceivable that if one simply repeats the chocolate question twice, the person will change her mind, as she may think the replication of the question is intended to make her ''think again.'' In a wide class of situations, however, changing one's response would be highly unexpected and even bizarre (e.g., replace a in the example above with ''Do you like chocolate?''). We assume that the pairs of questions asked, e.g., in Moore's study [5] are of this type. In a typical psychophysical task, the stimuli used are identical in all respects except for the property that a participant is asked to judge. Consider a simple detection paradigm in which the observer is presented one stimulus at a time, the stimulus being either a (containing a signal to be detected) or b (the ''empty'' stimulus, in which the signal is absent). For instance, a may be a tilted line segment, and b the same line segment but vertical, the tilt (which is the signal to be detected) being too small for all answers to be correct. Clearly, the participant in such an experiment cannot first decide that the stimulus being presented now has already been presented before, and that it has to be judged to be a because so it was before.
With this distinction in mind, however, the formalism (1)-(2)-(3) can be equally applied to both types of situations. In both cases a is to be replaced with some observable A, and b with some observable B (after which a and b per se can be forgotten). The values of A and B are the possible responses one records. In the psychophysical example, u A ð Þ and u B ð Þ each can attain one of two values: 1 = ''I think the stimulus was tilted'' or 0 = ''I think the stimulus was vertical''. The psychophysical analysis consists in identifying the hit-rate and false-alarm-rate functions (conditioned on the previous stimuli and responses) The learning (or sequential-effect) aspect of such analysis consists in identifying the function combined with the ''pure'' inter-trial dynamics (3).
In the opinion-polling example (say, about Clinton's and Gore's honesty), there are two hypothetical observables: A, corresponding to the question a = ''Is Bill Clinton honest?'', and B, corresponding to the question b = ''Is Al Gore honest?'', each observable having two possible values, 0 = ''Yes'' and 1 = ''No''. The analysis, formally, is precisely the same as above, except that one no longer uses the terms ''hits'' and ''false alarms'' (because ''honesty'' is not a signal objectively present in one of the two politicians and absent in another).
In quantum physics, a classical example falling within the same formal scheme as the examples above is one involving measuring the spin of a particle in a given direction. Let the experimenter choose one of two possible directions, a or b (unit vectors in space along which the experimenter sets a spin detector). If the particle is a spin-1 = 2 one, such as an electron, then the spin for each direction chosen can have one of two possible values, 1 = ''up'' or 0 = ''down'' (we need not discuss the physical meaning of these designations). These 1 and 0 are then the possible values of the observables A and B one associates with the two directions, and the analysis again consists in identifying the functions F , G, and H.

A brief account of conventional QT (with measurements of the first kind)
In QT, all entities operate in a Hilbert space, a vector space endowed with the operation of scalar product. The components of the vectors are complex numbers. We will assume that the Hilbert spaces to be considered are n-dimensional (n §2), but the generalization of all our considerations to infinite-dimensional spaces is trivial. The scalar product of vectors y, w is where x i and y i are components of y and w, respectively, and the star indicates complex conjugation: if c~azib, then c Ã~a {ib.
The length of a vector w is defined as EwE~ffi ffiffiffiffiffiffiffiffiffiffiffiffi ffi Sw, wT p . Any observable A in this n-dimensional version of QT is represented by an n|n Hermitian matrix (or operator, the two terms being treated as synonymous in a finite-dimensional Hilbert space). This is a matrix with complex entries such that, for any i, j [ 1, . . . , n f g , a ij~a Ã ji . In particular, all diagonal entries of A are real numbers. It is known from matrix algebra that any Hermitian matrix can be uniquely decomposed as where u 1 , . . . ,u k are pairwise distinct eigenvalues of A (all real numbers), and P i are eigenprojectors (n|n Hermitian matrices whose eigenvalues are zeros and ones). All eigenprojectors are positive semidefinite, i.e., for any nonzero vector x, SP i x,xT §0, and they sum to the identity matrix, P 1 z . . . zP k~I . For any distinct i, j [ 1, . . . , k f g , the eigenprojectors satisfy the conditions In QT, the distinct eigenvalues u 1 , . . . , u k are postulated to form the set of all possible values u A ð Þ. That is, as a result of measuring A in any given trial one always observes one of the values u 1 , . . . , u k . For simplicity (and because all our examples involve binary outcomes), in this paper we will only deal with the observables A that have two possible values u A ð Þ, denoted 0 and 1. This means that all our observables can be presented as A~P 1 , and Each eigenvalue u (0 or 1) has its multiplicity 1ƒdvn. This is the dimensionality of the eigenspace V associated with u, which is the space spanning the d pairwise orthogonal eigenvectors associated with u (i.e., the space of all linear combinations of these eigenvectors). Multiplication of P u by any vector x is the orthogonal projection of this vector into V . If d~1, the eigenspace V is the ray containing a unique unit-length eigenvector of A corresponding to u. The eigenvalue 1{u has the multiplicity n{d, the dimensionality of the eigenspace V \ which is orthogonal to V . If both d~1 and n{d~1 (i.e., n~2), then A is said to have a non-degenerate spectrum. In this paper we assume the spectra are generally degenerate (n §2).
The eigenvalues 0,1 of A in a given trial generally cannot be predicted, but one can predict the probabilities of their occurrence. To compute these probabilities, QT uses the notion of a state of the system. In any given trial the state is unique, and it is represented by a unit length state vector y. (For simplicity, we assume throughout the paper that the system is always in a pure state. This restriction is not critical for our analysis.) If the system is in a state y t ð Þ in trial t, and the measurement is performed on the observable A, the probabilities of the outcomes of this measurement are given by where u~0,1. Note that these probabilities are conditioned on the previous observables, in trials 1, . . . ,t{1, and their observed values.
Given that the observed outcome in trial t is u, the state y t ð Þ changes into y tz1 ð Þ according to This equation represents the von Neumann-Lüders projection postulate of QT. The denominator is nonzero because it is the square root of Pr u A ð Þ~u in trial t ½ , and (11) is predicated on u having been observed. The geometric meaning of G y t ð Þ , A, u is that y t ð Þ is orthogonally projected by P u into the eigenspace V and then normalized to unit length.
Finally, the inter-trial dynamics of the state vector in QT (between u and the next observable, separated by interval D) is represented by the unitary evolution formula Here, U D is a unitary matrix, defined by the property obtained by transposing U D and replacing each entry xziy in it with its complex conjugate x{iy. The unitary matrix U D should also be made a function of inter-trial variations in the environment (such as variations in overall noise level, or other participants' responses) if they are non-negligible. The identity matrix I is a unitary matrix: if U D~I , (12) describes no inter-trial dynamics, with the state remaining the same through the interval D. Note that the eigenvalue u itself does not enter the computations. This justifies treating it as merely a label for the eigenprojectors and eigenspaces (so instead of 0,1 we could use any other labels).
Remark 1. In Pauli's terminology [6], measurements described by (10)-(11)-(12) are called measurements of the first kind. The main distinguishing feature of such measurements is that two identical measurements ''immediately following each other'' (i.e., with U D~I ) produce identical results. In Section 5 we consider a generalized formalism that include measurements of the first kind as a special case, but also covers a broad (arguably, most important) subclass of what Pauli calls measurements of the second kind (defined as all measurements not of the first kind, or not necessarily of the first kind).

Measurement sequences, evolution (in)effectiveness, and stability
In this section we introduce terminology and preliminary considerations needed in the subsequent analysis. Throughout the paper we will make use of the following way of describing measurements performed in successive trials: We call this a measurement sequence. Each triple in the sequence consists of an observable A being measured, an outcome u recorded (0 or 1), and its conditional probability p. The probability is conditioned on the observables measured and the outcomes recorded in the previous trials of the same measurement sequence. Thus, As we assume that the outcomes u 1 , u 2 , . . . in a measurement sequence have been recorded, all probabilities p 1 , p 2 , . . . are positive if the measurement sequence exists. Recall that the observables A 1 , A 2 , . . . in a sequence are uniquely determined by the measurement procedures applied, a 1 , a 2 , . . ., and that the outcomes (0 or 1) are eigenvalues of these observables.
Consider now the two-trial measurement sequence have the eigenprojectors P 0 , P 1 , and B have the eigenprojectors Q 0 , Q 1 . If the initial state of the system is y~y 1 ð Þ , we have p~EP u yE 2 , and y 1 ð Þ transforms into y 2 ð Þ~P u y=EP u yE. Assuming an interval D between the two trials, y 2 ð Þ evolves into y 2 ð Þ D~U D y 2 ð Þ . This is the state vector paired with B in the next measurement, yielding, with the help of some algebra, As a special case U D can be the identity matrix (no inter-trial changes in the state vector), and then we have because in this case U { D Q w U D Q w : It is possible, however, that the latter equality holds even if U { D is not the identity matrix. In fact it is easy to see that this happens if and only if U D and B commute, i.e., U D B~BU D . For the proof of this, see Lemma 1 in Text S2 Proofs. We will say that Definition 1. A unitary operator U D is ineffective for an observable B if the two operators commute, U D B~BU D .
The justification for this terminology should be transparent: due to Lemma 1, in the computation (15) of the probability q the evolution operator can be ignored, yielding (16). The notion of inefficiency of the evolution operator will play an important role in the analysis of repeated measurements below.
Our next consideration regards the set of all possible values of the initial state vector y for a given measurement sequence. In the applications of QT in physics, this set is assumed to cover the entire Hilbert space in which they are defined. We are not justified to adopt this assumption in psychology, it would be too strong: one could argue that the initial states in a given experiment may be forbidden to attain values within certain areas of the Hilbert space. At the same time, it seems even less reasonable to allow for the possibility that the initial state for a given measurement sequence is always fixed at one particular value. The initial state vectors, as follows from both the QT principles and common sense, should depend on the system's history prior to the given experiment, and this should create some variability from one replication of this experiment to another. This is important, because, given a set of observables, specially chosen initial state vectors may exhibit ''atypical'' behaviors, those that would disappear if the state vector were modified even slightly. It is known [7] that in physical systems very close states may have very different physical properties. We need therefore to confine our analysis to properties that, while they may not hold for the entire Hilbert space, are stable with respect to very small changes in the initial states for which they hold. This leads us to adopting the following Stability Principle. If y is a possible initial state vector for a given measurement sequence in an n-dimensional Hilbert space, then there is an open ball B r y ð Þ centered at y with a sufficiently small radius r, such that any vector yzd in this ball, normalized by its length yzd k k, is also a possible initial state vector for this measurement sequence.
Definition 2. A property of a measurement sequence is (or holds) stable for an initial vector y, if it holds for all state vectors within a sufficiently small B r y ð Þ. Almost all our propositions below are proved under this stability clause, specifically by using the reasoning presented in Lemma 2 in Text S2 Proofs.
Remark 2. In Ref. [7] closeness is defined in terms of a measure called fidelity, which is different from the measure of closeness used in our stability principle. It is easy to show, however, that our measure topologically refines fidelity (i.e., any sequence of states converging to a given state in the sense of our measure also convergence to that state in the sense of fidelity).

Consequences for ''aRa''-type measurement sequences
Using the definitions and the language just introduced, we will now focus on the consequences of (10)-(11)-(12) for repeated measurements with repeated responses, Consider an opinion-polling experiment, with questions like a = ''Is Bill Clinton trustworthy?'' [5]. As argued for in Introduction, if the same question is posed twice, a?a, a typical respondent, who perhaps hesitated when choosing the response the first time she was asked a, would now certainly be expected to repeat it, perhaps with some display of surprise at being asked the question she has just answered. This may not be true for all possible questions, but it is certainly true for a vast class thereof. Let us formulate this as Property 1. For some nonempty class of questions, if a question is repeated twice in successive trials (separated by one of a broad range of inter-trial intervals), the response to it will also be repeated.
Remark 3. One may be tempted to dismiss this property as readily explained by the respondent's ''simply remembering'' her previous answers. As argued in Conclusion, however, the availability of such common sense explanations is irrelevant for our analysis, as its purpose is to determine if the phenomena we consider can be explained in a unified mathematical language of QT.
If a question a within the scope of Property 1 is represented by an observable A, we are dealing with the measurement sequence (17) in which p'~1. Such a measurement sequence does not disagree with the formulas (10)-(11)-(12). In fact it is even predicted by them if the intervening inter-trial evolution of the state vector is assumed to be ineffective. Indeed, (15) for the measurement sequence (17) acquires the form and the inefficiency of U D for A implies because P 2 u~P u holds for all projection operators. We remind the reader that (10)-(11)-(12) define measurements of the first kind (see Remark 1). Our consideration is confined to these measurements until Section 5.
We see that ineffective evolution implies Property 1. As it turns out, under the stability principle, this implication can be reversed: effective inter-trial evolution is excluded for the observables representing the questions falling within the scope of Property 1. In other words, for all such questions, the unitary operators U D can be ignored in all probability computations. Let us say that Definition 3. An observable A has the Lüders property with respect to a state vector y if the existence of the measurement A,u,p ð Þ for this y and an outcome u [ 0,1 f g implies that the property p'~1 holds stable for this y in the measurement sequence A,u,p ð Þ? A,u,p' ð Þ. In other words, the Lüders property means that an answer to a question (represented by A) is repeated if the question is repeated, and that this is true not just for one initial state vector y, but for all state vectors sufficiently close to it.
Remark 4. Note that for the ineffective evolution (including the measurements that ''immediately follow each other'') the Lüders property holds for all possible state vectors y. This was taken by Pauli [6] as the defining property of the measurements of the first kind. As argued at the introduction of the stability principle in Section 2, in psychology formulations involving ''all possible initial states'' would be unjustifiably strong.
We now can formulate our first proposition.

Proposition 1. [repeated measurements] An observable A has the Lüders property if and only if U D in (12) is ineffective for A.
See Text S2 Proofs for a formal proof. In the formulation of Property 1, the interval D and the question represented by A can vary within some broad limits, whence the inefficiency of U D for A should also hold for each of these intervals combined with each of these questions.
We have to be careful not to overgeneralize the Lüders property and the ensuing inefficiency property. As we discussed in Introduction, one can think of situations where replications of a question may lead the respondent to ''change her mind.'' The most striking contrast, however, is provided by psychophysical applications of QT. Here, the inter-trial dynamics not only cannot be ignored, it must play a central role.
Let us illustrate this on an old but very thorough study by Atkinson, Carterette, and Kinchla [8]. In the experiments they report, each stimulus consisted of two side-by-side identical fields of luminance L, to one of which a small luminance increment DL could be added, serving as the signal to be detected. There were three stimuli: In each trial the observer indicated which of the two fields, right one or left one, contained the signal. There were thus two possible responses: Left and Right. An application of QT analysis to these experiments requires a, b, c to be translated into observable A, B, C, each with two eigenvalues, say, 0~Left and 1~Right.
In the experiments we consider no feedback was given to the observers following a response. This is a desirable feature. It makes the sequence of trials we consider formally comparable to successive measurements of spins in quantum physics: measurements simply follow each other, with no interventions in between.
We are interested in measurement sequences Recall that the probabilities p' i (i~1, . . . ,6) are conditioned on previous measurements, so that, e.g., p' 1 zp' 2 =1 while p 1 zp 2~1 . For each observer, the probabilities were estimated from the last 400 trials out of 800 (to ensure an ''asymptotic'' level of performance). The results of one of the experiments (with equiprobable a and b), averaged over 24 observers, were as follows: In accordance with Proposition 1, we should conclude that the inter-trial evolution (12) here intervenes always and significantly.

Consequences for ''aRbRa''-type measurement sequences
Returning to the opinion polling experiments, consider the situation involving two questions, such as a = ''Is Bill Clinton honest?'' and b = ''Is Al Gore honest?'' The two questions are posed in one of the two orders, a?b or b?a, to a large group of people. The same as with asking the same question twice in a row, one would normally consider it unnecessary to extend these sequences by asking one of the two questions again, by repeating b or a after having asked a and b. A typical respondent, again, will be expected to repeat her first response. We find it ''almost certain'' (the ''almost'' being inserted here because we cannot refer to any systematic experimental study of this obvious expectation) that from the nonempty (in reality, vast) class of questions falling within the scope of Property 1 one can always choose pairs of questions falling within the scope of the following extension of this property. (See Remark 3.) Property 2. Within a nonempty subclass of questions (and for the same set of inter-trial intervals) for which Property 1 holds, if a question a is asked following questions a and b (in either order), the response to it will necessarily be the same as that given to the question a the first time it was asked.
As always, we replace a, b with observables A, B, and use the following notation: the probability of obtaining a value u when measuring the observable A is denoted p uA , q uA , etc. (the letters p,q, etc. distinguishing different measurements); we use analogous notation for the probability of obtaining a value v when measuring the observable B. Consider the measurement sequence Property 2 implies that in these sequences p' uA~1 and q' vA~1 . As it turns out, this property has an important consequence (assuming the two inter-trial intervals in the measurement sequences belong to the same class as D in Proposition 1).

Proposition 2. [alternating measurements] Let
Consequently, The proof of the proposition is given in Text S2 Proofs. Equations (24)-(25) are empirically testable predictions. Moreover, if we assume that the questions like ''Is Clinton honest'' and ''Is Gore honest'' fall within the scope of Property 2 (and it would be amazing if they did not), these predictions are known to be de facto falsified.
Property 3. Within a nonempty subclass of questions for which Property 2 holds (and for the same set of inter-trial intervals), the joint probability of two successive responses depends on the order in which the questions were posed.
This ''question order effect'' has in fact been presented as one for whose understanding QT is especially useful: the empirical finding that (24) fails is explained in Ref. [9] by assuming that A and B do not commute. In the survey reported by Moore [5], about 1,000 people were asked two questions, one half of them in one order, the other half in another.
As we can see, for all question pairs, the probability estimates of Yes to the same question differ depending on whether the question was asked first or second. Given the sample size (about 500 respondents per question pair in a given order) the differences are not attributable to chance variation.
Properties 1, 2, and 3 turn out to be incompatible within the framework of the conventional QT (with measurements only of the first kind). We should conclude therefore that this formalism cannot be applied to the questions that have these properties without modifications.

Would POVMs work?
Are there more flexible versions (generalizations) of QT that could be used instead?
One widely used generalization of the conventional QT involves replacing the projection operators with positive-operator-valued measures (POVMs), see, e.g., Refs. [10,11]. POVMs may but do not have to conform with (10)-(11)-(12). The generalized theory therefore involves measurements of both first and second kind.
The conceptual set-up here is as follows. We continue to deal with an n-dimensional Hilbert space (n §2). The notion of a state represented by a unit vector y in this space remains unchanged. The generalization occurs in the notion of an observable. For experiments with binary outcomes, an observable A of the conventional QT is defined by A~P 1 , with eigenprojectors P 0 ,P 1 ð Þand eigenvalues 0,1 ð Þ. The eigenvalues themselves are not relevant insofar as they are distinct: replacing 0,1 with another pair of distinct values amounts to trivial relabeling of the measurement outcomes. The information about the observable A therefore is contained in the eigenprojectors P 0 , P 1 . They are Hermitian positive semidefinite operators subject to the restrictions (9).
A generalized observable, or POVM, A (continuing to consider only binary outcomes) is defined as a pair E 0 ,E 1 ð Þof Hermitian positive semidefinite operators in the n-dimensional Hilbert space, summing to the identity matrix I. In other words, the generalization from eigenprojectors P u to POVM components E u amounts to dropping the idempotency and orthogonality constraints, defined in (8).
Any component E u (u~0,1) can be presented as M { u M u , where M u is some matrix and M { u is its conjugate transpose. The representation E u~M { u M u for a given E u is not unique, but it is supposed to be fixed within a given experiment (i.e., for a given measurement procedure).
The measurement formulas specifying F and G in (1)-(2) can now be formulated to resemble (10)- (11). The conditional probability of an outcome u~0,1 of the measurement of This measurement transforms y t ð Þ into The formula for the evolution of the state vector between trials remains the same as for the conventional observables, (12).
It is easy to see that we no longer need to involve inter-trial changes in the state vector to explain the fact that, in psychophysics, a replication of stimulus does not lead to the replication of response. In a measurement sequence A,u,p ð Þ? A,u,p' ð Þ, if U D is the identity matrix, then p' is given by SE 2 u y,yT=SE u y,yT. This value is generally different from 1, because E 2 u , not necessarily an orthogonal projector, is generally different from E u . This is interesting, as it suggests the possibility of treating psychophysical judgments and opinion polling within the same (evolution-free) framework. This encouraging possibility, however, cannot be realized: the theory of POVMs cannot help us in reconciling Properties 2 and 3 in opinion-polling, because POVMs with Lüders property cannot be anything but conventional observables. This is shown in the following.
Proposition 4. [no generalization] A POVM A~E 0 ,E 1 ð Þhas the Lüders property with respect to a state y if and only if A is a conventional observable (i.e., it is a Hermitian operator, and its components E 0 ,E 1 are its eigenprojectors).
See Text S2 Proofs for a formal proof. Proposition 4 says that POVMs to be used to model opinion polling should be conventional observables, otherwise Property 1 will be necessarily contradicted. Put differently, the Lüders property effectively confines the measurements that can be considered within the framework of POVMs to those of the first kind. But then Propositions 1 and 2 are applicable, and they say that the inter-trial dynamics is ineffective, and that all the observables representing different questions within the scope of Property 2 pairwise commute. This, in turn, allows us to invoke Proposition 3, with the result that, contrary to Property 3, the order of the questions should have no effect on the response probabilities.
Remark 5. Not all measurements of the second kind can be described by POVMs (see, e.g., the discussion of quantum operations in Ch. 8 of Ref. [11]). One might argue that POVMs represent most ''typical'' quantum measurements. It remains to be seen, however, if other generalizations or modifications of QT would lead to different results (see Conclusion).

Conclusions
Let us summarize. Both cognitive science and quantum physics deal with fundamentally probabilistic input-output relations, exhibiting a variety of sequential effects. Both deal with these relations and effects by using, in some form or another, the notion of an ''internal state'' of a system. In psychology, the maximally general version is provided by the probabilistic generalization of the old behaviorist S-O-R scheme: the probability of an output is a function of the input and the system's current state (function F in (1)), and both the input and the output change the current state into a new state (function G in (2)). If we discretize behavior into subsequent trials, then we need also a function describing how the state of the system changes between the trials (function H in (3)).
Quantum physics uses a special form of the functions F , G, and H, the ones derived from (or constituting, depending on the axiomatization) the principles of QT. Functions F and G are given by (10)- (11) in the conventional QT, and by (27)-(28) in the QT with POVMs, with the inter-trial evolution in both cases described by (12). Nothing a priori precludes these special forms of F ,G,H from being applicable in cognitive science, and such applications were successfully tried: by appropriately choosing observables and states, certain experimental data in human decision making were found to conform with QT predictions [3].
As this paper shows, however, QT encounters difficulties in accounting for some very basic empirical properties. In opinion polling (more generally, in all psychological tasks where stimuli/ questions can be confidently identified by features other than those being judged), there is a class of questions such that a repeated question is answered in the same way as the first time it was asked. This agrees with the Lüders projection postulate, and renders the use of both the inter-trial dynamics of the state vector and the measurements of the second kind (at least those falling within the framework of the POVM theory) unnecessary: to have this property the questions asked have to be represented by conventional observables with ineffective inter-trial dynamics. In many situations, we also expect that for a certain class of questions the response to two replications of a given question remains the same even if we insert another question in between and have it answered. This property can only be handled by QT if the conventional observables representing different questions all pairwise commute, i.e., can be assigned the same set of eigenvectors. This, in turn, leads to a strong prediction: the joint probability of two responses to two successive questions does not depend on their order. This prediction is known to be violated for some pairs of questions. The explanation of the ''question order effect'' is in fact one of the most successful applications of QT in psychology [9], but it requires noncommuting observables, and these, as we have seen, cannot account for the repeated answers to repeated questions.
Our paper in no way dismisses the applications of QT in cognitive psychology, or diminishes their modeling value. It merely sounds a cautionary note: it seems that we lack a deeper theoretical foundation, a set of well-justified principles that would determine where QT can and where it must not be used. We should also point out that the problems identified in this paper are not unique to QT. For example, random utility theories also have difficulty explaining the trial to trial dependencies in answers to questions. If we assume that a response is based on a randomly sampled utility in each trial, then repeating the response will produce different random samples in each trial. That is why in the experiments designed to test random utility models questions never repeated back to back, and instead ''filler trials'' are inserted to make participants forget their earlier choice.
Clearly, the basic properties that we have shown to contravene QT can be ''explained away'' by invoking considerations formulated in traditional psychological terms. One can, e.g., dismiss the problem with repeated questions in opinion polling by pointing out that the respondents ''merely'' remember their previous answers and ''simply'' do not want to contradict themselves. One can similarly dismiss the question order effect by pointing out that the first question ''simply'' changes the context for pondering the second question, e.g., reminds