Mutual Information between Discrete and Continuous Data Sets

doi:10.1371/journal.pone.0087357

Figure 1.

Procedures for estimating MI.

(A) An example joint probability density where is a real-valued scalar and can take one of three values, indicated red, blue and green. For each value of the probability density in is shown as plot of that color, whose area is proportional to . (B) A set of data pairs sampled from this distribution, where is represented by the color of each point and by its position on the -axis. (C) The computation of in our nearest-neighbor method. Data point is the red dot indicated by a vertical arrow. The full data set is on the upper line, and the subset of all red data points is on the lower line. We find that the data point which is the 3rd-closest neighbor to on the bottom line is the 6th-closest neighbor on the top line. Dashed lines show the distance from point out to the 3rd neighbor. , , and for this point and . (D) A binning of the data into equal bins containing data points. MI can be estimated from the numbers of points of each color in each bin.

More »

Expand

Figure 2.

MI estimated by nearest-neighbors versus binning.

(A) Sampling distributions (thick lines) represented by a differently-colored graph in for each of three possible values of the discrete variable (red, blue and green). A histogram of a representative data set for each distribution is overlaid using a thinner line. (B) MI estimates as a function of using the nearest-neighbor estimator. 100 data sets were constructed for each distribution, and the MI of each data set was estimated separately for different values of . The median MI estimate of the 100 data sets for each -value is shown with a black line; the shaded region indicates the range (lowest 10% to highest 10%) of MI estimates. (C) MI estimates plotted as a function of bin size using the binning method (right panel), using the same 100 data sets for each distribution. The black line shows the median MI estimate of the 100 data sets for each -value; the shaded region indicates the 10%–90% range

More »

Expand

Figure 3.

Binning error relative to nearest-neighbors error.

(A) Error from the binning method divided by error from the nearest-neighbor method. Errors in MI were calculated for each of the 100 data sets of the square-wave (light blue) and Gaussian (purple) 10,000-length data sets (see Figure 2). Each line shows the ratio of the median MI for a given number of neighbors estimated using binning, as a function of n, to the median (over all data sets and all values of ) of all MI estimates using nearest neighbors. The binning method gives superior results for values of for which this ratio is less than one. Evidently, there is no optimal value of that works for all distributions: works well for the square wave distribution but is better for a Gaussian distribution. (B) MI error using nearest-neigbor method versus binning method for the 400-data point sets.

More »

Expand