^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: GJ SR. Performed the experiments: GJ SR. Analyzed the data: GJ SR. Contributed reagents/materials/analysis tools: GJ SR. Wrote the paper: GJ CF SR.

We show that the Confusion Entropy, a measure of performance in multiclass problems has a strong (monotone) relation with the multiclass generalization of a classical metric, the Matthews Correlation Coefficient. Analytical results are provided for the limit cases of general no-information (n-face dice rolling) of the binary classification. Computational evidence supports the claim in the general case.

Comparing classifiers' performance is one of the most critical tasks in machine learning. Comparison can be carried out either by means of statistical tests

The definition of performance measures in the context of multiclass classification is still an open research topic as recently reviewed

One relevant case study regards the attempt of extending the Area Under the Curve (AUC) measure, which is one of the most widely used measures for binary classifiers but it has no automatic extension to the multiclass case. The AUC is associated to the Receiver Operating Characteristic (ROC) curve

Other measures are more naturally extended, such as the accuracy (ACC,

For binary tasks, MCC has attracted the attention of the machine learning community as a method that summarizes into a single value the confusion matrix

A second family of measures that have a natural definition for multiclass confusion matrices are the functions derived from the concept of (information) Entropy, first introduced in

In our study, we investigate the intriguing similarity existing between CEN and MCC. In particular, we experimentally show that the two measures are strongly correlated, and that their relation is globally monotone and locally almost linear. Moreover, we provide a brief outline of the mathematical links between CEN and MCC with detailed examples in limit cases. Discriminancy and consistency ratios are discussed as comparative factors, together with functions of the number of classes, class imbalance, and behaviour on randomized labels.

Given a classification problem on

In information theory, the entropy

The Confusion Entropy measure CEN for a confusion matrix

For

The definition of the MCC in the multiclass case was originally reported in

The covariance function between X and Y can be written as follows:

Finally the Matthews Correlation Coefficient MCC can be written as:

As discussed before, CEN and MCC live in different ranges, whose extreme values are differently reached. In Box 1 of

It is worth noting that CEN is more discriminant than MCC in specific situations, although the property is not always welcomed. For instance, in

For small sample sizes, we can show that CEN has higher discriminant power than MCC,

We proceed now to show an intriguing relationship between MCC and CEN. First consider the confusion matrix

A numerical simulation shows that the tMCC approximation in

(a) and

In order to compare measures, we consider also the degree of consistency indicator

The behaviour of the Confusion Entropy is instead rather diverse from MCC and ACC for the family of

The MCC measure of this confusion matrix is

On the other hand, the Confusion Entropy for the same family of matrices is

Another pathologic case is found in the case of dice rolling classification on unbalanced classes: because of the multiplicative invariance of the measures, we can assume that the confusion matrix for this case has all entries equal to one but for the last row, whose entries are all

In the two-class case (P: positives, N: negatives), the confusion matrix is

The Confusion Entropy can be written for the binary case as:

Gray vertical lines correspond to the examples provided in

Indeed, differently from the multi-class case, CEN and MCC are poorly correlated for two classes. We computed MCC and CEN for all the 4 598 125 possible confusion matrices for a binary classification task on

We compared the Matthews Correlation Coefficient (MCC) and Confusion Entropy (CEN) as performance measures of a classifier in multiclass problems. We have shown, both analytically and empirically, that they have a consistent behaviour in practical cases. However each of them is better tailored to deal with different situations, and some care should be taken in presence of limit cases.

Both MCC and CEN improve over Accuracy (ACC), by far the simplest and widespread measure in the scientific literature. The point with ACC is that it poorly copes with unbalanced classes and it cannot distinguish among different misclassification distributions.

CEN has been recently proposed to provide an high level of discrimination even between very similar confusion matrices. However, we show that this feature is not always welcomed, as in the case of random dice rolling, for which

Our analysis also shows that CEN should not be reliably used in the binary case, as its definition attributes high entropy even in regimes of high accuracy and it even gets values larger than one.

In the most general case, MCC is a good compromise among discriminancy, consistency and coherent behaviors with varying number of classes, unbalanced datasets, and randomization. Given the strong linear relation between CEN and a logarithmic function of MCC, they are exchangeable in a majority of practical cases. Furthermore, the behaviour of MCC remains consistent between binary and multiclass settings.

Our analysis does not regard threshold classifiers; whenever a ROC curve can be drawn, generalized versions of the Area Under the Curve algorithm or other similar measures represent a more immediate choice