Probability density and information entropy of machine learning derived intracranial pressure predictions

doi:10.1371/journal.pone.0306028

Fig 1.

Data workflow schematic for probability density function analysis of Extreme Gradient Boost derived intracranial pressure predictions.

Two cautionary subset cases (8 and 9) were subsetted from seven hold-out test cases and demonstrated a wide and conflicting difference between measured and predicted intracranial pressure from the arterial and venous XGB models. All cases underwent data pre-processing, descriptive statistics, and hypothesis tests were computed from the retinal vascular pulse parameters. Intracranial pressure predictions were derived from the retinal arterial and venous parameters independently. Probability density functions were generated from intracranial pressure predictions from both XGB models, where the median was considered the most favorable compared to the mean and the mode. This was likely because the median represents the geometric mean of a log-normal distribution and is supported by findings from previous work [3]. Correlations were computed between the absolute difference between the median predicted and measured intracranial pressure (Diff_ICPmd) and imaging characteristics: (n = number of vascular data points analyzed, Bilateral = both eyes tested, nIOP_i = number of induced intraocular pressure levels applied during imaging) and distribution characteristics: (DI_ICP = definite integral ± 1cm water of the median, tsKS = two-sample Kolmogorov-Smirnov statistic, ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic), sEnt = Shannon entropy.

More »

Expand

Fig 2.

Violin plot.

Comparing the distribution of the harmonic regression wave amplitude in the hold-out and cautionary subsets.

More »

Expand

Table 1.

Descriptive statistics of the harmonic regression wave amplitude, the cosine and sine coefficients of the first and second harmonics of the hold-out test and cautionary subset cases.

More »

Expand

Fig 3.

Ridgeline plot for the probability density function of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm of the arterial and venous pulsation data.

In contrast to the dominant right-skewed distribution in most cases, in cases 8 and 9, the distribution of the arterial predictions converges on a uniform distribution.

More »

Expand

Fig 4.

Overlapping ridgeline plots of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm of the arterial and venous pulsation data.

The two-sample Kolmogorov-Smirnov statistic (tsKS) provides a quantitative comparison between two distributions across the whole range rather than just a point estimate. Within a single case, the closer the approximation of the distributions from the arterial and venous models, the lower the tsKS value. Case 4 demonstrates the lowest tsKS statistic (0.080897, p<0.003), and case 7 is the highest (0.48101).

More »

Expand

Table 2.

Imaging characteristics and probability density distribution parameters of hold-out test cases 1–7 and cautionary cases 8 and 9 are compared to measured and Extreme Gradient Boost predicted intracranial pressure (cm water).

More »

Expand

Fig 5.

(A-D) A comparison of the empirical cumulative distribution function of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm from four cases with contrasting two-sample Kolmogorov-Smirnov statistics from four cases. Two hold-out test cases demonstrating the lowest two-sample Kolmogorov-Smirnov statistic (tsKS). Cases 4 and 5 (A, B) demonstrate favorable concordance between venous and arterial derived predictions in contrast to the subset cases 8 and 9 (C, D), where the concordance is poor. The difference in separation of the ECDF between the two models can be observed in cases 8 and 9 (C, D). The tsKS statistic depends on a ratio parameter consisting of the product of the distribution data points divided by the sum [25]. Red = arterial model, Blue = venous model.

More »

Expand

Table 3.

Multivariate analysis of variance (MANOVA) estimating the associations between Diff_ICPmd and both parameters of the imaging and distribution of Extreme Gradient Boost derived intracranial pressure predictions.

More »

Expand

Fig 6.

Correlation matrix comparing distribution and imaging parameters from the hold-out and cautionary subsets.

Features from the top row are of interest. There was a significant negative correlation of Diff_ICPmd with laterality (bilateral was numerically coded as 2 and unilateral was 1 for this analysis) of -0.59. There was a moderate to low correlation with parameters of the distribution of the XGB-derived prediction (ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic). However, the correlation with DI_ICP = the definite integral ±1cm water of the median was strongly negative (-0.52), indicating that the higher the weight of the area under the curve within these bounds, the more accurate was the agreement between predicted and measured intracranial pressure. Similarly, the correlation with tsKS = two sample Kolmogorov-Smirnov statistic (0.49) was significant, indicating that the higher the overlap between the vascular model distributions, the higher the agreement with measured intracranial pressure. Comparably, Shannon entropy (sEnt) showed a strong positive correlation (0.48) indicating convergence to a uniform distribution (increased randomness) with higher Diff_ICPmd values.nIOP_i = the levels of induced intraocular pressure applied during the imaging, n = total number of tested data points.

More »

Expand

Fig 7.

Pearson correlation between the definite integral ±1cm around the median of the probability density distribution (DI_ICP) and the absolute difference between predicted and measured intracranial pressure (Diff_ICPmd) for the A) arterial and B) venous models. Only A) the arterial model ( = -0.76, p = 0.02) achieved statistical significance, in contrast to B) the venous model ( = -0.10, p = 0.799). This indicated that the arterial model was a more discriminatory indicator of agreement between measured and predicted intracranial pressure.

More »

Expand

Fig 8.

Distribution of intracranial pressure and the number of analyzed image data points from the Extreme Gradient Boost training data set.

There is a low contribution of data points to the model at intracranial pressure levels <17 and >43 cm water with two participants below (K, M) and above (C, F) these boundaries, respectively. ICP = intracranial pressure in cm water [3].

More »

Expand