^{*}

The author has declared that no competing interests exist.

Conceived and designed the experiments: BCR. Performed the experiments: BCR. Analyzed the data: BCR. Contributed reagents/materials/analysis tools: BCR. Wrote the paper: BCR.

Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

Mutual information (MI)

The MI between two data sets

This paper describes a method for estimating the MI between a discrete data set and a continuous (scalar or vector) data set, using a similar approach to that of Ref.

MI between a discrete and a continuous variable is equivalent to a weighted form of the Jensen-Shannon (JS) divergence

This section explains how to apply our nearest-neighbor method for estimating MI; the derivation is left to the Analysis section. We will also describe the binning method that we compare with our estimator.

The input to a MI estimator is a list of

(A) An example joint probability density

For each data point

In our implementation

We also implemented a binning method to compare with our nearest-neighbor method. Binning methods make the data completely discrete by grouping the data points into bins in the continuous variable

The average is taken over all measurements

In the Supporting Information we have included two MATLAB implementations of our method: a general-purpose estimator that works with vector-valued data sets, and a faster implementation for the usual case where both data sets are scalars (simple numbers). The Supporting Information also contains our implementation of a MI estimator using the binning method, as well as the testing script that compares the three estimators and generated the plots for this paper.

To test our method, we chose two simple distributions

(A) Sampling distributions

Both the nearest-neighbor method and the binning method involve a somewhat arbitrary parameter that must be set by the user. The nearest neighbor method requires that the user specify

Our first conclusion is that there is a much simpler prescription for setting the

Our second conclusion is that there is

(A) Error from the binning method divided by error from the nearest-neighbor method. Errors in MI were calculated for each of the 100 data sets of the square-wave (light blue) and Gaussian (purple) 10,000-length data sets (see

We conclude that MI estimation by the nearest neighbor method is far more accurate than binning-based MI estimates, barring a lucky guess of the unknowable best value of

Here we derive the formula for our nearest-neighbor MI estimator.

Consider a discrete variable

Here

The remaining task is to estimate the logarithm of two continuous distributions evaluated at given data points. For this we use a nearest-neighbor entropy estimator originally developed by Kozachenko and Leonenko

For each sampled data point

There is a systematic averaging error that comes from the fact that the

As mentioned before, the mutual information between discrete and continuous data is equivalent to a weighted Jensen-Shannon (JS) divergence between the conditional distributions

(M)

(M)

(M)

(M)

The author wishes to acknowledge Vikram Agarwal and Walter Ruzzo for helpful discussions, and Andrew Laszlo, Henry Brinkerhoff, Jens Gunlach and Jenny Mae Samson for valuable comments on this paper.