The efficacy of Euler diagrams and linear diagrams for visualizing set cardinality using proportions and numbers

This paper presents the first empirical investigation that compares Euler and linear diagrams when they are used to represent set cardinality. A common approach is to use area-proportional Euler diagrams but linear diagrams can exploit length-proportional straight-lines for the same purpose. Another common approach is to use numerical annotations. We first conducted two empirical studies, one on Euler diagrams and the other on linear diagrams. These suggest that area-proportional Euler diagrams with numerical annotations and length-proportional linear diagrams without numerical annotations support significantly better task performance. We then conducted a third study to investigate which of these two notations should be used in practice. This suggests that area-proportional Euler diagrams with numerical annotations most effectively supports task performance and so should be used to visualize set cardinalities. However, these studies focused on data that can be visualized reasonably accurately using circles and the results should be taken as valid within that context. Future work needs to determine whether the results generalize both to when circles cannot be used and for other ways of encoding cardinality information.

• Appendix A contains information on how the questions were generated for the tasks in the study.
• Appendix B presents the examples used for training participants.
• Appendix C presents the questions and associated diagrams used in the studies.
• Appendix D provides information on the statistical methods used to analyse the data.
• Appendices E-G present all of the statistical output generated from the methods employed for the three studies.
All of the diagrams included in this document are scaled to 50% of the actual size used in the study in order to fit within the page width.

A -Question Generation
For the S-type questions, a random permutation of the list of labels was generated using the website www.random.org. The first label in this permuted list then became the label which would appear in the question, and the next four labels in the permuted list were the labels which would appear as check-boxes, alongside the "None of the above" option. (Where there were only five sets in the diagram, the second step was redundant.) For the I-type questions, a random permutation of the list containing those intersections containing at most three labels, was generated using the website www.random.org. The first intersection in this permuted list became the intersection whihc would appear in the question, and the next four intersetions in the permuted list were the intersections which would appear as check-boxes, alongside the "None of the above" option.
A check was performed on the number of check-boxes that appeared in the correct answer, which ranged from 0 (representing "None of the above") to 4. These numbers were: • 0/"None of the above":

B -Training for Studies A, B & C
This section contains the data presented to participants for the purposes of training. After attempting each question, participants were given the correct answer, an explanation of the correct answer (with reference to the diagram) and a modified diagram with the regions of interest highlighted. An exception was for training question 4, as the correct answer was "None of the above", and thus no regions were highlighted. These modified diagrams for training questions 1 to 3 are presented alongside the respective question.

Question 1
• Question: Tick the check boxes where more people have exactly that combination of interests than Cars and Economics only.
• Check boxes to be ticked: Bands • Task Type: Intersection comparison -More than.

Question 2
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Economics.
• Check boxes to be ticked: Android, Design • Task Type: Set comparison -Less than.
• SNAP Data Set Used: 16652550 • ED-N • ED-N:Explanation • ED-P Question 3 • Question: Tick the check boxes where more people have exactly that combination of interests than Games only.
• Check boxes to be ticked: Books, Design and Health • Task Type: Intersection comparison -More than.
• • Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in iPhone.
• Check boxes to be ticked: None of the above • Task Type: Set comparison -More than.
• • Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Relaxation.
• Check boxes to be ticked: Android, Bands, College, Travel • Task Type: Set comparison -More than. •

Question 2
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Programming.
• Check boxes to be ticked: Bands, Games • Task Type: Set comparison -Less than. •

Question 3
• Question: Tick the check boxes where more people have exactly that combination of interests than News only.
• Check boxes to be ticked: None of the above • Task Type: Intersection comparison -More than. •

Question 5
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Relaxation.
• Check boxes to be ticked: Hifi, iPhone, Media • Task Type: Set comparison -More than.
• • Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Stars.
• Check boxes to be ticked: Games • Task Type: Set comparison -Less than.
• • Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Games.
• Check boxes to be ticked: Android, College, Economics, Hifi • Task Type: Set comparison -More than. •

Question 10
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Camping.
• Check boxes to be ticked: None of the above • Task Type: Set comparison -Less than. •

Question 11
• Question: Tick the check boxes where more people have exactly that combination of interests than Technology only.
• Check boxes to be ticked: None of the above • Task Type: Intersection comparison -More than.
• SNAP Data Set Used: 18534908

Question 13
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Relaxation.
• Check boxes to be ticked: Food, iPhone, Travel • Task Type: Set comparison -More than.
• SNAP Data Set Used: 64441390

Question 14
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Stars.
• Check boxes to be ticked: Music • Task Type: Set comparison -Less than. •

Question 17
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Books.
• Check boxes to be ticked: Hifi, iPhone • Task Type: Set comparison -More than. •

Question 19
• Question: Tick the check boxes where more people have exactly that combination of interests than Camping, Games and Relaxation only.
• Check boxes to be ticked: News, Hifi, Books • Task Type: Intersection comparison -More than. •

Question 21
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Bands.
• Check boxes to be ticked: Games, Music • Task Type: Set comparison -More than. •

Question 22
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Design.
• Check boxes to be ticked: News • Task Type: Set comparison -Less than. •

Question 25
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Health.
• Check boxes to be ticked: Economics, Internet, Media • Task Type: Set comparison -More than. •

Question 29
• Question: Tick the check boxes where the total number of people interested in that topic is greater than the total number of people interested in Web.
• Check boxes to be ticked: Design, Health, Stars • Task Type: Set comparison -More than. •

Question 30
• Question: Tick the check boxes where the total number of people interested in that topic is less than the total number of people interested in Economics.
• Check boxes to be ticked: College, Programming • Task Type: Set comparison -Less than. •

D -Statistical Methods
Here we present in detail the statistical models used for the first two studies, each of which included three treatment groups. The models used for the last study were simpler, as there were only two treatments.
For each of the three empirical studies, we employed two local odds ratios generalized estimating equations models [2] to analyse the accuracy data. An ANOVA calculation was not appropriate for these data as they violated the normality assumption of an ANOVA test. The non-parametric version of ANOVA, Kruskal-Wallis, was also not appropriate, as the responses for each individual are correlated, and thus not independent. The first model, for the first two studies, was employed to answer RQ1, RQ2 and RQ3, compared the visualization types overall irrespective of task category: is the probability for subject i to provide at most j − 1 correct answers to question k and where j = 1, ..., 5 1 .
• x ik1 is the indicator that the diagram given to subject i for answering question k was proportional, and • x ik2 is the indicator that the diagram given to subject i for answering question k was both proportional and numerical for i = 1, . . . , n, given n participants, and k = 1, . . . , 32. With this model, we could determine whether the odds of providing j or fewer correct answers for one of the visualization types (proportional, numerical, or both) was significantly different from others while taking into account the expected correlation among the responses provided by each individual participant.
The second model, for the first two studies, was employed to see whether the answers to RQ1, RQ2 and RQ3 still held when we take into account task category (set or intersection): where the variables are as above and, in addition, x ik3 is the indicator that the diagram given to subject i for answering question k was of type I. This model allowed us to estimate the odds of providing j or fewer correct answers with one combination of visualization type and task category compared to other combinations and determine whether significant differences existed.
For the time data, we used two generalized estimation models [1] that allowed us to estimate whether the time taken to provide answers was significantly different. Again we present details of the two models used for the first two studies. Following a similar approach for the accuracy data, the first of the two models directly addressed RQ1, RQ2 and RQ3 and the more complex model delved deeper into the data to see whether task category was important. The more complex model is given here, which accounts for the different combinations of visualization type (proportional, numerical, or both) and task category (set or intersection): Statistical output is included in the submitted supplementary material. We report on the main findings in the associated paper.

E -Study A (ED) -Statistical Analysis
Here we present the statistical models used specifically for study 1 and the associated output.

Models
For the overall comparison, the following model was fitted to the accuracy data is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5. • x ik1 is the indicator that the diagram given to participant i for answering question k was ED-P, • x ik2 is the indicator that the diagram given to participant i for answering question k was ED-P&N for i = 1, . . . , 277 and j = 1, . . . , 32.
For the overall comparison, the following model was fitted to the time data For the set and intersection comparisons, following model was fitted to the accuracy data is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5. • x ik1 is the indicator that the diagram given to participant i for answering question k was ED-P, • x ik2 is the indicator that the diagram given to participant i for answering question k was ED-P&N • x ik3 is the indicator that the diagram given to participant i for answering question k was of type I for i = 1, . . . , 277 and j = 1, . . . , 32.
For the set and intersection comparisons, the following model was fitted to the time data

Overall results
The accuracy analysis: # ED-P vs ED-N ComparisonStats(fitmodel,c(0, 0, 0, 0, 0, 1, 0)) ## Estimate 95% LB 95% UB p-value ## 1.315777 1.058239 1.635991 0.013500 # ED-P&N vs ED-N ComparisonStats(fitmodel,c(0, 0, 0, 0, 0, 0, 1)) ## Estimate 95% LB 95% UB p-value ## 0.8866999 0.6888363 1.1413985 0.3506000 ComparisonStats(fitmodel,c(0, 0, 0, 0, 0, 1, -1 The inference will be based on the model based estimated odds ratios (the value under Estimate). The results will be declared statistically significant only if the corresponding p-value is less than 0.05 and/or if the corresponfing 95% confidence interval does not contain 1. The two middle columns correspond to the lower and the upper bound of the 95% confidence interval for the odds ratios. The interpretation of the above analysis is the following: The estimated odds of having j or less correct answers with ED-P diagrams is 1.3158 times that with ED-N diagrams (p-value = 0.0135). The estimated odds of having j or less correct answers with ED-P diagrams is 1.4839 times that with ED-P&N diagrams (p-value = 0.0006). In other words, ED-N and ED-P&N diagrams are more likely to produce a higher ammount of correct answers than ED-P diagrams.

# ED-P vs ED-P&N
The fitted probabilities for j for each group are X1 <-matrix(fitted(fitmodel)[c(1000,8000,3000) The inference will be based on the model based estimated ratios (the value under Estimate). The results will be declared statistically significant only if the corresponding p-value is less than 0.05 and/or if the corresponfing 95% confidence interval does not contain 1. The p-values and the 95% confidence intervals suggest that participants needed less time with ED-P diagrams than with ED-P&N or ED-N diagrams and that there is no significance difference between ED-P&N and ED-N diagrams.

# (ED-P and type S) vs (ED-P&N and type S)
The inference will be based on the model based estimated odds ratios (the value under Estimate). The results will be declared statistically significant only if the corresponding p-value is less than 0.05 and/or if the corresponfing 95% confidence interval does not contain 1. The two middle columns correspond to the lower and the upper bound of the 95% confidence interval for the odds ratios. The interpretation of the above analysis is the following: For type S diagrams, the estimated odds of having j or less correct answers with ED-P diagrams is 0.8070 times that with ED-N diagrams (p-value = 0.1367). The estimated odds of having j or less correct answers with ED-P diagrams is 1.5950 times that with ED-P&N diagrams (p-value = 0.0152). In other words, for type S, ED-N and ED-P&N diagrams are more likely to produce a higher ammount of correct answers than ED-P diagrams and there is no difference between ED-N and ED-P&N diagrams.
Time comparisons:

# (ED-P and type S) vs (ED-P&N and type S)
ComparisonStats(fittimemodel,c(0, 1,-1, 0, 0, 0)) ## Estimate 95% LB 95% UB p-value ## 0.7815840 0.7073398 0.8636211 0.0000000 The inference will be based on the model based estimated ratios (the value under Estimate). The results will be declared statistically significant only if the corresponding p-value is less than 0.05 and/or if the corresponfing 95% confidence interval does not contain 1. For type I, the p-values and the 95% confidence intervals suggest that participants needed less time with ED-P diagrams than with ED-P&N or ED-N diagrams, and that participants needed less time with ED-P&N than ED-N diagrams.

Intersection cardinalities results
Accuracy analysis:

F -Study B (LD) -Statistical models and output
Here we present the statistical models used specifically for study 2 and the associated output.

Models
For the overall comparison, the following model was fitted to the accuracy data is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5.
• x ik1 is the indicator that the diagram given to participant i for answering question k was LD-P, • x ik2 is the indicator that the diagram given to participant i for answering question k was LD-P&N for i = 1, . . . , 272 and k = 1, . . . , 32.
For the overall comparison, the following model was fitted to the time data For the set and intersection comparisons, the following model was fitted to the accuracy data is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5. • x ik1 is the indicator that the diagram given to participant i for answering question k was LD-P, • x ik2 is the indicator that the diagram given to participant i for answering question k was LD-P&N • x ik3 is the indicator that the diagram given to participant i for answering question k was of type I for i = 1, . . . , 272 and k = 1, . . . , 32.
For the set and intersection comparisons, the following model was fitted to the time data

Set cardinalities results
Accuracy analysis:

G -Study C (ED vs LD) -Statistical models and output
Here we present the statistical models used specifically for study 3 and the associated output.

Models
For the overall comparison, the following regression model was fitted to the accuracy data is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5. • x ik1 is the indicator that an ED-P&N diagram was given to participant i for answering question k, for i = 1, . . . , 185 and k = 1, . . . , 32.
For the overall comparison, the following regression model was fitted to the time data where • t ik is the time needed for participant i to answer question t, • x ik1 is defined as in the model for the accuracy data for i = 1, . . . , 185 and k = 1, . . . , 32.
For the set and intersection comparisons, the following regression model was fitted to the accuracy data where • Pr(Y ik ≤ j) is the probability for participant i to provide at most j − 1 correct answers to question k and where j = 1, 2, 3, 4, 5. • x ik1 is the indicator that an ED-P&N diagram was given to participant i for answering question k, • x ik2 is the indicator that the k-th question given to participant i was type I for i = 1, . . . , 185 and k = 1, . . . , 32.
For the set and intersection comparisons, the following regression model was fitted to the time data log (t ik ) = δ 0 + δ 1 where • t ik is the time needed for participant i to answer question k, • x ik1 and x ik2 are defined as in the model for the accuracy data for i = 1, . . . , 185 and k = 1, . . . , 32.

Overall results
Accuracy analysis:

Set cardinalities results
Accuracy analysis:

Intersection cardinalities results
Accuracy analysis:
Time analysis: