Figure 1.
(A) An example joint probability density where
is a real-valued scalar and
can take one of three values, indicated red, blue and green. For each value of
the probability density in
is shown as plot of that color, whose area is proportional to
. (B) A set of
data pairs sampled from this distribution, where
is represented by the color of each point and
by its position on the
-axis. (C) The computation of
in our nearest-neighbor method. Data point
is the red dot indicated by a vertical arrow. The full data set is on the upper line, and the subset of all red data points is on the lower line. We find that the data point which is the 3rd-closest neighbor to
on the bottom line is the 6th-closest neighbor on the top line. Dashed lines show the distance
from point
out to the 3rd neighbor.
,
, and for this point
and
. (D) A binning of the data into equal bins containing
data points. MI can be estimated from the numbers of points of each color in each bin.
Figure 2.
MI estimated by nearest-neighbors versus binning.
(A) Sampling distributions (thick lines) represented by a differently-colored graph in
for each of three possible values of the discrete variable
(red, blue and green). A histogram of a representative data set for each distribution is overlaid using a thinner line. (B) MI estimates as a function of
using the nearest-neighbor estimator. 100 data sets were constructed for each distribution, and the MI of each data set was estimated separately for different values of
. The median MI estimate of the 100 data sets for each
-value is shown with a black line; the shaded region indicates the range (lowest 10% to highest 10%) of MI estimates. (C) MI estimates plotted as a function of bin size
using the binning method (right panel), using the same 100 data sets for each distribution. The black line shows the median MI estimate of the 100 data sets for each
-value; the shaded region indicates the 10%–90% range
Figure 3.
Binning error relative to nearest-neighbors error.
(A) Error from the binning method divided by error from the nearest-neighbor method. Errors in MI were calculated for each of the 100 data sets of the square-wave (light blue) and Gaussian (purple) 10,000-length data sets (see Figure 2). Each line shows the ratio of the median MI for a given number of neighbors estimated using binning, as a function of n, to the median (over all data sets and all values of
) of all MI estimates using nearest neighbors. The binning method gives superior results for values of
for which this ratio is less than one. Evidently, there is no optimal value of
that works for all distributions:
works well for the square wave distribution but
is better for a Gaussian distribution. (B) MI error using nearest-neigbor method versus binning method for the 400-data point sets.