Figure 1.
The browser-based interface for remote cell labelling.
Each expert is allowed to navigate through the database of cell images, eliminating the infected cells and marking those that are questionable (i.e., cannot be reliably labelled as infected or uninfected).
Figure 2.
Forward model of the proposed setup.
There are a total of images with possible labels from
being sent to
experts. The
expert labels each image with a certain probability
. The final dataset consists of an
matrix of values from the set
.
Figure 3.
Decoding model of the proposed setup.
The expert responses are treated as the observed variables and the true image labels
as the latent variables in a mixture model with parameters
. Expectation Maximisation (EM) is used to obtain the Maximum Likelihood solution to the data.
Table 1.
Mixture Model Parameters.
Figure 4.
Experimental results on the level of agreement among experts.
More than 8,000 RBC images were remotely presented to these experts, using the interface that is illustrated in Figure 1. For example, at least five out of nine experts agree on 97% of images labelled as negative, 64% labelled as positive, and 7% labelled as questionable; whereas only one out of nine experts at any given time agrees on the full set of labels (i.e., no two experts agree completely!). Note that these percentages are based on individual RBC images. For example, in the 3 column, it is not the same 3 experts who agree on the images but possibly different sets of 3 experts for different images.
Figure 5.
Experimental results on the level of self-inconsistency of each expert within each category.
More than 8,000 RBC images were separately presented to these experts, using the interface that is illustrated in Figure 1. For example, expert 1 changes her/his decision regarding what s/he labelled as negative 1.6% of the time. Similarly s/he changed her/his mind for 1.3% and 1.8% of the positive and questionable images, respectively.
Figure 6.
Performance results from 9 simulated experts with varying average ensemble accuracies.
We can see that the combined accuracy (in green) is always higher than the maximum accuracy of the ensemble. Refer to the Methods section for details.
Figure 7.
Experimental performance metrics of the experts.
The metrics are calculated after combining the responses of all the experts using EM and then assuming the results to be correct. ,
,
,
, where
, and
correspond to the number of true positive, true negative, false positive, and false negative labels respectively.
Figure 8.
Sample cells classified by the proposed methodology.
Each observation made by the
expert can take one of three category values from
. The parameters governing the model are
and
, where
is the true category of the observed data point, and
and
correspond to the expert labelling the data point as 0 or 1, respectively. Please refer to the Methods Section for further details.
Figure 9.
ROC curves for smear-level diagnosis.
A) parasitemia . B) parasitem
.