Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Accuracy, calibration and informativeness for the IDEA protocol explained.

The graph shows four hypothetical experts, their best estimates (black dots), and their credible intervals (horizontal lines). The red dashed vertical line represents the realised truth. Expert A has a best estimate close to the realised truth, and their interval captures the realised truth (which over many questions contributes towards their calibration), they are also informative (narrower intervals) relative to Expert B, although Expert B is more accurate (provides a best estimate closer to the realised truth). Expert C is informative but is not accurate and does not capture the realised truth (calibration). Expert D is accurate and informative. However, their bounds do not encapsulate the realised truth (calibration).

More »

Fig 1 Expand

Fig 2.

Three essential elements of good expert judgement under uncertainty.

More »

Fig 2 Expand

Fig 3.

Key steps of the IDEA protocol used in this study and outlined above.

In this study we used the four-step elicitation (step 2), which is outlined in Fig 4 below. The question format produces a best estimate (black dots in step 3) with associated credible upper and lower estimates from individuals (horizontal lines in step 3), these are aggregated to form group judgements (estimates marked with red dots in step 3). The results are then discussed by the group, and individuals are enabled to update their estimates (black dots and horizontal lines step 5). These Round 2 judgements are then aggregated (red dots and horizontal lines step 5) and taken as the final estimate. A practical guide to the protocol is provided in [87].

More »

Fig 3 Expand

Table 1.

A summary of the 14 biotic and abiotic questions asked of participants during the elicitation.

More »

Table 1 Expand

Fig 4.

In this elicitation we used the four-step question format [77] outlined in this figure to derive a best estimate (black dot) and upper and lower credible intervals (horizontal lines).

More »

Fig 4 Expand

Fig 5.

Graphical feedback provided to one group of participants with their Round 1 estimates standardised to 80% credible intervals.

The circles represent their best estimates. Note the estimates are plotted on a non-linear square root scale (as this provided the clearest representation of the spread of the estimates). The table below the graph was included to clearly show participants (in numbers) the effect of the standardisation on their upper and lower bounds. CoTS = Crown-of-thorns starfish.

More »

Fig 5 Expand

Table 2.

An example, of comments provided by participants when making their Round 1 estimates, and subsequent comments received during the remote discussion phase.

CoTS = crown-of-thorns starfish.

More »

Table 2 Expand

Fig 6.

An example of judgments provided by each group for Question 1 of the elicitation.

The question asked for the average density of crown-of-thorns starfish (CoTS) that would be detected per 2-minute manta-tow on Rib Reef, Queensland, Australia, by the Australian Institute of Marine Science in 2016. The graph shows the estimates (best estimate, with 80% upper and lower credible intervals) provided by participants in Round 1 (R1 Ind), and then estimates provided in Round 2 (R2 Ind). Participants who withdrew following Round 1 (R1Ind (W) and R2 Ind (W)) were not included in the group aggregation (R1 Mean and R2 Mean). A ninth group (‘Super Group’) was created from the aggregation of all 58 participants who took part in Round 1 (R1 Ind) and Round 2 (R2 Ind). The realised truth (0.14 CoTS), is displayed as a red vertical line. Note that the scale of the x-axis is a non-linear (square root) scale.

More »

Fig 6 Expand

Fig 7.

An example of feedback provided for Group 1 for each of the 14 questions.

The graph shows the estimates (best estimate, with 80% upper and lower credible intervals) provided by participants in Round 1 (R1 Ind), and then estimates provided in Round 2 (R2 Ind). Participants who withdrew following Round 1 (R1 Ind (W) and R2 Ind (W)) were not included in the group aggregations (R1 Mean and R2 Mean). The realised answers for each question are displayed above the graphs and indicated by the red vertical line. Note that the scale of the x-axis is non-linear (square root) scale.

More »

Fig 7 Expand

Fig 8.

Relationship between self-rating (0 = no experience, 10 = specialist understanding (e.g. regularly collect data, prepare or sign off on reports, or provide advice on this topic) and accuracy (lower number = more accurate) for each of the 76 participants, across five different subject domains in Round 1.

The ‘Q’ indicates to the number of questions from which accuracy was scored. The linear models revealed slopes between less than -0.002 and 0 and adjusted R2 values between -0.02 to 0.02 (not significant at a 0.05 level). Spearman’s rank correlations ranged between -.01 and -0.17, none of which were significant at the 0.05 level (of a two-tailed statistical test).

More »

Fig 8 Expand

Fig 9.

There was no detectable difference in the accuracy (ALRE), calibration or informativeness of those recommended as experts or novices.

In fact, some of the most accurate (lower ALRE score) and well-calibrated (a score of 0.8 represents perfect calibration) individuals were sourced through other means.

More »

Fig 9 Expand

Fig 10.

Women were on average more accurate (lower ALRE score), better calibrated (a score of 0.8 represents perfect calibration) but less informative than men (higher numbers relate to less informative individuals).

More »

Fig 10 Expand

Fig 11.

Comparison of the accuracy, calibration and informativeness of individuals and groups.

The graphs show that groups were generally more accurate (lower number) than the median individual. Groups had a similar calibration and informativeness score, however, they had consistently lower variance (MAD) than individuals.

More »

Fig 11 Expand

Fig 12.

Scatterplots show the change of each individual (n = 58) in Round 2 across the three variables (accuracy, calibration, and informativeness).

If dots fall below the line for accuracy or informativeness it shows that individuals improved their scores on these measures. For calibration dots above the line indicate individuals increased the number of realisations captured between their upper and lower bounds (a score of 0.80 represents perfect calibration).

More »

Fig 12 Expand

Fig 13.

Scatterplots show the difference between groups and individuals in Round 1 and Round 2 (note only those who submitted answers in Round 1 and Round 2 were included (n = 58)).

The horizontal grey line represents the median accuracy score of participants in Round 2 (lower scores are more accurate), the vertical line represents perfect calibration 0.80. Groups were on average slightly more accurate and better calibrated in Round 2 than in Round 1. The black triangle represents a super-group which is an aggregate of the estimates (arithmetic mean) of each of the 58 participants for each question before scoring the resulting estimates against the realised value.

More »

Fig 13 Expand

Fig 14.

Changes in accuracy (distance from the realised truth), in Round 2, for individuals (left) and groups (right).

Units for the y-axis were the density of CoTS (crown-of-thorns starfish) per 2-minute manta-tow. Note the scale is a non-linear (square root) scale. An improvement indicates revised estimates were closer to the truth than the estimates provided in Round 1. To put these numbers into perspective, minimum thresholds were developed for each question. For Question 1, the threshold was 0.22 CoTS per 2-minute manta-tow which indicates an incipient outbreak by the Australian Institute of Marine Science. Dots at or above this line indicate changes that were above this minimal threshold. The graph shows that more individuals improved their accuracy in Round 2 than those who reduced their accuracy. When changes were made they were usually above the assigned thresholds, and for some individuals their improvement in accuracy was substantial (59.90 CoTS per 2-minute manta-tow). The graph also shows that for this question more groups improved than reduced their accuracy, and the amount by which they improved was above the assigned threshold. Graphs for each of the questions can be found in S3 File.

More »

Fig 14 Expand

Fig 15.

The proportion of questions where the best estimate was updated by individuals and groups for which updating improved the accuracy of the best estimate.

More »

Fig 15 Expand

Fig 16.

In Round 1, there was no difference in accuracy and calibration between those who withdrew and those who remained.

However, those who updated their estimates in Round 2 became on average better calibrated and more accurate than those who withdrew.

More »

Fig 16 Expand