Expectation-Maximization Binary Clustering for Behavioural Annotation
Fig 1
Comparison of the EMC (left) and EMbC (right) algorithms. Bivariate (velocity/turn) scatter-plots showing the clustering reached by each algorithm, corresponding to the same trajectory and exact initial conditions. Clusters are shown in different colours. In the right panel, the EMbC delimiters determining the final L/H binary regions are depicted as dashed lines (r.L, rL.) and dot-dashed lines (r.H, rH.). The centroids of each cluster are shown as black dots. Left: the EMC yields an output clustering that is difficult to link to a clear semantics. Right: the EMbC is driven by the delimiters, forcing the centroids to lay within the associated binary regions, yielding a final clustering that can be clearly interpreted in terms of L/H values of the variables (orange:LL, red:LH, cyan:HL and blue:HH). The matching among binary regions and clusters is not perfect because data-points are assigned to clusters depending on their weights, not on the delimiter values. In this case, the EMbC performs better (the clustering log likelihoods are -3.3368 for the EMC and -3.2180 for the EMbC), but this result can not be generalized.