Expressing uncertainty in Human-Robot interaction

Most people struggle to understand probability which is an issue for Human-Robot Interaction (HRI) researchers who need to communicate risks and uncertainties to the participants in their studies, the media and policy makers. Previous work showed that even the use of numerical values to express probabilities does not guarantee an accurate understanding by laypeople. We therefore investigate if words can be used to communicate probability, such as “likely” and “almost certainly not”. We embedded these phrases in the context of the usage of autonomous vehicles. The results show that the association of phrases to percentages is not random and there is a preferred order of phrases. The association is, however, not as consistent as hoped for. Hence, it would be advisable to complement the use of words with numerical expression of uncertainty. This study provides an empirically verified list of probabilities phrases that HRI researchers can use to complement the numerical values.


Reviewer 1
One major concern from me is that the conclusion is very weak. Through the user studies, there seems no clear conclusion on which way is better in expressing uncertainties.
This is not a weak conclusion at all. Our study does indeed show that there is little difference between numerical expressions and verbal expressions. This is a surprising and interesting result. Science is not the hunt for statistically significant differences, but an attempt to learn. A negative result is still a result, and the dangers of reporting only (positive) statistically significant results are now becoming widely known. PLOS One is one of the leading journals that clearly declares that all papers that make a valuable contribution to the scientific literature, that are replicable, that are clearly written, and whose conclusions are supported by the data deserve publication (https://doi.org/10.1371/journal.pbio.0040401).
The decision trees that the authors proposed did not outperform the naive method. Hence, I think that the authors should try to explore more aspects of the work and make it contribute more to the community.
This comment is unhelpfully vague. The naive method and the trees are both statistical modelling methods, which make certain assumptions, and are appropriate for the data set in question. We thus applied both. The fact that the two models perform equally well does not mean that further modelling is needed. Perhaps, if the reviewer would like to be more explicit about what (s)he means by "exploring more aspects of the work and make it contribute more to the community".
Some minor comments focus on the writing in the conclusion part. Some of the expressions are very confusing to read. For instance, I cannot interpret the exact meaning of "we can predict the percentage response exactly around 50% of the time and within 10% of the true response 80% of the time".
We agree, that the sentences make for slow reading, but we are not sure what causes confusion. The first part means what it says. We can use our model to predict what percentage probability the respondent will assign to a certain verbal prompt. Our "guess" will be exactly right half the time and will be within 10% of the actual response 80% of the time.
The authors should summarize their results and findings in a more clear way. This is a very general suggestion and it would be utmost helpful if the reviewer could please let us know what he/she thinks to be unclear. We could then focus our attention on those issues.

Reviewer 2
Make sure it is clear when presenting percentages that the meaning of the measurement. For example, a very brief explanation of what the apparent error rate is measuring could help understanding.
We define the apparent error rate in line 153 as the proportion of incorrect predictions. This is a standard textbook definition, and we are not sure whether it can be said any clearer.
Discuss how the decision tree is constructed in more detail.
In line 132 we explicitly refer to the textbook by Friedman, Hastie, and Tibshirani, which explains how the trees are constructed. We feel that such a textbook tutorial is necessary to fully understand the method and it goes beyond the scope of this paper to explain it. An interested reader can also simply use the existing function in the statistical software R, which has exactly the same format as the function for fitting a linear model. Whereas anyone interested in theory or algorithmic implementation can refer to the above textbook or to many of tutorials available on the internet.
More discussion of the low p-values (line 304) would be useful. I would make it more clear why more emphasis is put on the other statistical measures instead of the p-value. You could also discuss more about the percentage values in Fig  9 and 10.
In line 304 we do not speak of p-values, but rather importance scores, which we introduce in lines 155-165. We do understand, however, that the readership used to seeing p-values might automatically interpret them as such. We have therefore changed the text in lines 301-316 and the table legend to emphasize that we are talking about importance scores rather than p-values. We have also added another reference to Lin 2013, which explains why p-values are not useful for large data sets.
Minor comments: In line 190, there may be a typo in the prompt as it is not grammatically correct. I'd recommend including the abbreviation (AER) after "apparent error rate" at the beginning of sentence in line 284 for ease of reading.
We added the AER acronym as instructed.
The sentence on line 362 which starts "There appears to be more points..." is a strange phrase to use since it could be quantitatively determined whether or not the upper quartile of Fig 4 has more points than Fig 2. Changed to "There are more points..."

Reviewer 3
We changed all the prompt labels to "uncertainty".
We have realised that the Agent does not appear in the decision tree in Figure  9 (as the variable does not have a strong enough effect on choice). We have therefore corrected the legend in Figure 9 to read 1=you, 2= autonomous vehicle.