Using meta-predictions to identify experts in the crowd when past performance is unknown

A common approach to improving probabilistic forecasts is to identify and leverage the forecasts from experts in the crowd based on forecasters’ performance on prior questions with known outcomes. However, such information is often unavailable to decision-makers on many forecasting problems, and thus it can be difficult to identify and leverage expertise. In the current paper, we propose a novel algorithm for aggregating probabilistic forecasts using forecasters’ meta-predictions about what other forecasters will predict. We test the performance of an extremised version of our algorithm against current forecasting approaches in the literature and show that our algorithm significantly outperforms all other approaches on a large collection of 500 binary decision problems varying in five levels of difficulty. The success of our algorithm demonstrates the potential of using meta-predictions to leverage latent expertise in environments where forecasters’ expertise cannot otherwise be easily identified.


expertise
In this appendix, we investigate the conditions under which the meta-probability weighting (MPW) algorithm is able to leverage expertise by weighting the probabilistic predictions of experts more than novices. We show that under very general conditions, an individual with a more informative signal about the true state will be weighted more heavily by the algorithm than an individual with a less informative signal. This implies that from an ex-ante perspective, an individual who has access to a more informative information system will have a higher expected weight than an individual with a less informative information system.

Preliminaries
As is common in the information economics literature, we model expertise by considering an environment in which individuals have access to an information system (often called an experiment) in which they receive a signal that they use to update an initial prior belief [21][22][23][24]. Experts and novices are distinguished by the informativeness of their information system but are identical in all other dimensions.
We consider a Bayesian model in which a crowd of forecasters share a common prior p(T ) that an event is true. Each forecaster receives a private signal S, that is a random variable taking on real value realizations in the set {s 1 , . . . , s m }∪{s ∅ } where 0 ≤ s 1 < s 2 < · · · < s m ≤ 1 and s 1 < s ∅ < s m .
As our outcome space is binary, it is without loss of generality that we normalize the signals so that their value is equal to the posterior belief that an event is true. That is, s k := p(T |s k ). We let s ∅ represent the case where an individual receives an uninformative signal so that s ∅ := p(T ).
The potential signals that an individual can receive is based on the information service that each forecaster has access to. We assume that there are two alternative information services -one for experts and one for the novices -with likelihood matrices [Q E oj ] 2×(m+1) and [Q N oj ] 2×(m+1) . Each element of the first row of Q E and Q N represents the probability that the signal is s j given the outcome is o = T . Likewise, each element of the second row of Q E and Q N represent the probability that the signal is s j given the outcome is o = F . For ease, we will denote the first row elements with T and the second row elements with F . Thus Q E T j : . We note two important features of an information service. First, an information service acts as a transition matrix from a state of nature to a signal and thus j Q τ oj = 1 for each row o ∈ {T, F } and information service τ ∈ {N, E}. Second, upon receiving a message from information service τ , agents revise their priors using Bayes rule. For any signal that occurs with positive probability (i.e., where Q τ T j + Q τ F j > 0), the posterior belief that the event is true is given by By construction, this is equal to s j for all signals that occur with positive probability. 1 We assume that the proportion of experts in the crowd is known to all parties and given by θ ∈ [0, 1]. We also assume that the properties of Q E and Q N are common knowledge. We make two additional assumptions regarding the information services used by novices and experts: Assumption 1 Information service Q E is more informative than information service Q N : there Assumption 1 says that when Q E is more informative than Q N , Q N oi = k Q E ok Z ki . As we are multiplying across the rows of Q E , we can interpret Z ki as the conditional probability that when message k is received by Q E , message i was received by Q N . Thus Z ki = p(s i |s k ) and Q E is more informative than Q N if it is possible to garble the signals of Q E and generate Q N . Note that Z is a non-negative stochastic matrix with i Z ki = 1.
Assumption 2 Experts and Novices draw independent signals: for a signal s i from Q N and a signal Assumption 2 implies that the information services are ranked but that signals from the two information services are independent once we condition for the state. Thus, for any signal s i drawn from Q t , t ∈ {N, E}, and any signal s j drawn from Q τ , τ ∈ {N, E}, Rearranging Bayes Rule, it is the case that: We note that Assumption 2 is also sufficient for the monotone likelihood ratio property (MLRP) to hold for signals between any two information services. This property implies that when an individual receives a high signal, he believes that other forecasters are also more likely to receive a high signal.

The Expected Contribution of Experts and Novices
We now turn to the question of how the MPW algorithm weights experts and novices. A forecaster with signal s k will make a probabilistic forecast of s k . Thus, given an outcome state o, the expected prediction from information service Q t is given by Aggregating over both information systems, the expected prediction of the population in state o is given by In the absence of any information service, the probabilistic forecast of each individual would be s ∅ . By the law of total expectations, the posteriors are a mean-preserving spread of the prior, and thus we have for τ ∈ {E, N }. This also implies that and that A forecaster with signal s k 's meta-prediction about the others is equal to M (θ|s k ) = s k P (θ|T ) + (1 − s k )P (θ|F ).
Substituting in for P (θ|F ) using (2), the meta-prediction of an individual with signal s k can be expressed as In the MPW algorithm, the weight of an individual is based on the difference between the individual's prediction and meta-prediction. For an individual with signal s k , or, equivalently: Note first that the difference between an individual's signal and his or her meta-prediction is zero at s ∅ and is linearly increasing in s k . This feature implies that the weight of an individual with signal s k , proportional to |s k − M (θ|s k )|, is directly related to the informativeness of the posterior that an individual holds relative to the prior. Thus, individuals with more informative posteriors (an ex-post notion of expertise) will be weighted proportionally more than individuals with less informative posteriors.
Aggregating over all possible signals that an individual might receive, the expected weight of an individual in information service Q t is proportional to: By Assumption 1, the information system of the expert is a mean-preserving spread of the signals of the novices. Noting that individuals with posteriors farther away from their prior will have a larger weight, it is possible to use Blackwell's theorem to show the following: Proposition 1 The expected weight of an expert is greater than the expected weight of a novice.
We note that Proposition 1 does not rely on us having only two types of agents in the population and can be readily extended to an arbitrary number of information systems that are ranked by informativeness. Thus, the results here are general and are likely to hold in a wide variety of problems. We also note that, the MPW algorithm weights experts more than novices if we use an ex-ante notion of expertise based on informativeness of information systems or an ex-post notion of expertise based on the difference between an individual's posterior and their prior.

Proof of Proposition 1
Proof of Proposition 1: We begin this proof of Proposition 1 by stating Blackwell's Theorem [25]: Blackwell's Theorem For information service Q E to be more informative than Q N it is necessary and sufficient that the value of information in service Q E is greater than the value of information in service Q N for all sets of terminal actions, all utility functions, and all prior beliefs.
By Assumption 1, Q E is more informative than Q N . Let the action set a ∈ {T, F } correspond to voting on whether an answer is true or false, and consider a utility function U (a, o) that maps actions and states of the world into outcomes. Let U (T, . Given a signal s i , expected utility is maximized by choosing a = T when s i ≥ s ∅ and a = F otherwise.
The expected utility of this strategy when the posterior is less than s ∅ is given by: Likewise, the expected utility of this strategy given a posterior greater than s ∅ is Thus, we can express the expected utility of an individual with signal s i as By Blackwell's theorem, the expected utility of information service Q E is higher than the expected value of information service Q N for any utility function and any prior belief. Using an initial prior of P (T ) = s ∅ , this implies, By the law of iterated expectations: Using the result in (4) above, this implies that Noting that this equation is identical to the expected weight of an individual from information