Figures
Abstract
Even with the powerful statistical parameters derived from the Extreme Gradient Boost (XGB) algorithm, it would be advantageous to define the predicted accuracy to the level of a specific case, particularly when the model output is used to guide clinical decision-making. The probability density function (PDF) of the derived intracranial pressure predictions enables the computation of a definite integral around a point estimate, representing the event’s probability within a range of values. Seven hold-out test cases used for the external validation of an XGB model underwent retinal vascular pulse and intracranial pressure measurement using modified photoplethysmography and lumbar puncture, respectively. The definite integral ±1 cm water from the median (DIICP) demonstrated a negative and highly significant correlation (-0.5213±0.17, p< 0.004) with the absolute difference between the measured and predicted median intracranial pressure (DiffICPmd). The concordance between the arterial and venous probability density functions was estimated using the two-sample Kolmogorov-Smirnov statistic, extending the distribution agreement across all data points. This parameter showed a statistically significant and positive correlation (0.4942±0.18, p< 0.001) with DiffICPmd. Two cautionary subset cases (Case 8 and Case 9), where disagreement was observed between measured and predicted intracranial pressure, were compared to the seven hold-out test cases. Arterial predictions from both cautionary subset cases converged on a uniform distribution in contrast to all other cases where distributions converged on either log-normal or closely related skewed distributions (gamma, logistic, beta). The mean±standard error of the arterial DIICP from cases 8 and 9 (3.83±0.56%) was lower compared to that of the hold-out test cases (14.14±1.07%) the between group difference was statistically significant (p<0.03). Although the sample size in this analysis was limited, these results support a dual and complementary analysis approach from independently derived retinal arterial and venous non-invasive intracranial pressure predictions. Results suggest that plotting the PDF and calculating the lower order moments, arterial DIICP, and the two sample Kolmogorov-Smirnov statistic may provide individualized predictive accuracy parameters.
Citation: Abdul-Rahman A, Morgan W, Vukmirovic A, Yu D-Y (2024) Probability density and information entropy of machine learning derived intracranial pressure predictions. PLoS ONE 19(7): e0306028. https://doi.org/10.1371/journal.pone.0306028
Editor: Alon Harris, Icahn School of Medicine at Mount Sinai, UNITED STATES
Received: October 12, 2023; Accepted: June 10, 2024; Published: July 1, 2024
Copyright: © 2024 Abdul-Rahman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: We would like to declare that the authors Anmar Abdul-Rahman, William Morgan, and Dao-Yi Yu are the inventors of the Modified Photoplethysmography method. Furthermore, we have no financial interest in the results of this study. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Introduction
Although machine learning models can be valuable in providing recommendations, the lack of a human factor in the decision-making process and the algorithm’s “black box” nature risk creating a false authority impervious to appeal or rational argument. Uncertainty can be mitigated by statistical goodness of fit parameters, qualitative and quantitative extensions can assist decision-making. Conventionally, mean squared error (MSE) and its rooted form root mean squared error (RMSE), as well as the mean absolute error (MAE) and its percentage variant (MAPE), are commonly used as statistical goodness of fit parameters for regression models. Whereas informative, these measures share a common disadvantage: a single-point estimate does not represent the regression performance concerning the distribution of the ground truth element [1]. Another parameter, the coefficient of determination (R2), despite being invariant to linear transformations of the independent variables’ distribution, is dependent on the slope of the regression, and this may mask a wider confidence interval around the regression surface in the presence of a steep regression slope [2]. In this framework, it is crucial to establish an individualized parameter of reliability, particularly when depending on the model’s output for clinical decision-making. In previous work [3], the measures of central tendency [mean, median, peak density (mode)] were used as point estimates of Extreme Gradient Boost (XGB) derived intracranial pressure (ICP). These predictions were generated from all vascular pulsation points in the image field. High agreement was observed between the median predicted with measured intracranial pressure, where the arterial and venous Bland-Altman bias±standard error was 0.1386±1.6545 and 0.0343±1.8013 cm water, respectively. In addition to the benefits of visualizing the probability density distribution, defining the properties of the distribution would provide further evidence of predictive accuracy.
There are three possibilities for the distributions of XGB-derived probability density functions. If the probability density function (PDF) converged upon a symmetrical unimodal distribution (Gaussian, beta), then the distribution weights, by definition, should coincide [4, 5]. Other symmetrical unimodal distributions (triangular and Cauchy distributions) are not likely, as triangular distributions are not consistent with biological characteristics, and the latter has infinite integral without finite moments of order greater than or equal to one; only fractional absolute moments exist [6, 7]. Although a symmetrical unimodal distribution would be convenient, as the central tendencies would coincide, this instance is unlikely. Skewed distributions (log-normal, beta, gamma, and exponential) are common when mean values are low, variances large, and variables cannot be negative [8]. These represent the most likely candidates for the distribution of the XGB-derived intracranial pressure, where the distribution of central tendencies demonstrated a range of agreements with measured intracranial pressure [3]. Finally, a uniform PDF can exist if predicted intracranial pressure is no better than random.
Although visualization offers a qualitative assessment of the PDF; a quantitative approach would include plotting the PDF and calculating the moments of the distribution. A PDF defines the relationship between a continuous random variable (x) and its probability distribution f(x). This function has four properties [9]:
- The function must be greater than zero: f(x) >0
- The area under the curve must be equal to one:
dx = 1
- The function f(x) is piecewise continuous.
- The definite integral between two values represents the probability of occurrence of the variable between two points [a,b]:
Machine learning has a wide range of possible applications where predicting the probability density for a variable rather than a point estimate would be more informative. This is either because variance is input-dependent or small probabilities, although maybe analytically trivial, represent significant real-world outcomes [10, 11]. This is the case for intracranial pressure estimation, where a small margin of error would result in a significant difference in a clinical outcome. There is a wide variability in normal intracranial pressure values largely due to differences in age, gender, and body mass index. Although normal values of 7–15mmHg have been reported [12]. In a prospective study of lumbar puncture measured ICP in 339 normal subjects, Bø et al. reported a reference range of approximately 3–22mmHg [13]. Their findings suggest that physiological ICP may vary by up to a 7-fold range. Additionally studies demonstrate variation of continuously measured ICP within individuals, likely due to the multitude of interactions between physiologic parameters involved in intracranial pressure homeostasis, including postural, cardiovascular, neurological and respiratory parameters [14]. In a systematic review and meta-analysis of ICP monitoring systems Zaccharetti et al. found that the average error between simultaneous ICP measurements from different pressure sensors was approximately 1.5 mmHg, but the variability was large, with up to 11.4 mmHg difference in 95% of readings [15]. These factors render determining the precision and accuracy of ICP measurements challenging even in absence of underlying pathology. While evidence suggests artificial intelligence (AI) algorithms could aid clinical decision-making, current research emphasizes the need for systems enabling model evaluation, bias detection, and generalizability. Wang et al. compared machine learning models to traditional scoring tools for predicting mortality risk in traumatic brain injury patients. Among 47 studies with 156 models, machine learning models showed relatively high accuracy for both in-hospital and out-of-hospital mortality. Notably, traditional tools achieved comparable accuracy. The authors highlight the need for standardized reporting and validation of machine learning models to ensure clinical applicability and generalizability [16]. Similarly, van Hal et al. assessed bias risk and clinical readiness of studies using AI to predict intracranial hypertension in traumatic brain injury patients. They found most studies had high bias risk and low readiness for clinical integration. Despite promising potential, the authors concluded further improvement and validation are necessary before implementing these models in clinical practice [17]. These findings underline the practical importance of bias detection, as AI algorithms may perform well under specific conditions not replicable in clinical settings. Using properties of the probability density function the approximation of the estimate to the ground truth may be confirmed by the capacity to validate ICP readings, irrespective of the measurement methodology.
The other distinct advantage of this approach is that the probability density function is normalized, enabling a direct comparison between distributions from different samples [18]. Furthermore, it allows the calculation of the probability of the point estimate within a range of values by computing the definite integral, which is defined as the area under the curve between two arbitrary points [a,b].
Quantitative tests, which include the Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors, Anderson-Darling, and Cramer-von Mises statistics, among others, can either evaluate the convergence of a variable on a normal distribution in the case of a single sample test or measure the concordance between two distributions in the case of a two-sample test [19, 20]. Among the most commonly employed tests, the Kolmogorov-Smirnov statistic is weighted to the center of the distribution, and the Anderson-Darling statistic is more sensitive to the tails [21, 22]. A distinct advantage of the quantitative approach is that it can estimate the concordance of intracranial pressure probability density distributions derived from the arterial and venous models. The uncertainty (randomness) in a distribution can likewise be evaluated by computing the Shannon entropy (sEnt), also known as information entropy, which increases with increasing uncertainty in the probability density distribution [23]. A single test statistic can be used to compare the model accuracy across a range of predictions rather than over a single-point estimate. In contrast to a global goodness of fit score, evaluation of the predicted intracranial pressure distribution properties should provide a qualitative and quantitative assessment of the XGB model performance at the level of an individual case. Additionally, this approach could define imaging parameters associated with favorable predictive outcomes.
This work compares the pulsation amplitude decomposition in the frequency domain and properties of the probability density distributions of the hold-out test cases from a previously published work [3] with two cautionary subset cases, 8 and 9. The latter two cases are examples for which the XGB model provided contradictory intracranial pressure predictions.
Materials and methods
Participants were referred to the Lions Eye Institute over six years (2015–2021) from the Neurology and Neurosurgery Departments and were to undergo lumbar puncture for suspicion of idiopathic intracranial hypertension. Written consent was obtained from each participant. Recruitment for the study occurred between 03/03/2016 -28/11/2021. Several studies have revealed that over 90% of patients with idiopathic intracranial hypertension are female, making gender a well-recognized risk factor in this condition [24], hence the gender bias in this study. Optic nerve color photography combined with modified photoplethysmography, a form of ophthalmodynamometry, was performed on all participants. Lumbar puncture was performed on patients within three days after the modified photoplethysmography. This device consists of a force transducer surrounding a central contact lens, which allows imaging of the optic disc under a dynamic range of induced intraocular pressure (IOPi). Study approval was granted by the University of Western Australia Human Ethics Committee (Approval #: 2015–11-756-A-2.), adhering to the tenets of the Declaration of Helsinki. Participants were to have no prior history of retinal or optic nerve pathology and were required to have transparent ocular media. The Extreme Gradient Boost Model (XGB) was derived from a training/test set of 21 subjects. A total of 129,600 data points were sampled from the images, 56,932 arterial and 72,668 venous data points. A 80/20 training/test data split was implemented. A further 7 subjects were used as hold-out test cases in evaluating this model. Further details of the model are published in previous work [3]. In the current study a total of 9 subjects were included in the analysis in this analysis, 7 subjects were in the hold-out test group from the original dataset (cases 1 to 7), and 2 subjects (cases 8 and 9) were in the cautionary subset. The latter subset demonstrated a wide difference between measured and predicted intracranial pressure and conflicting predictions from the arterial and venous models were found in ongoing evaluation of the model.
The distributions of the harmonic regression wave amplitude (HRWa) and most of the Fourier coefficients were non-normal. Therefore the median was used to measure central tendency, and dispersion was estimated using the interquartile range (IQR); additionally, the range was computed. Hypothesis tests were conducted using the Wilcoxon test with Bonferroni correction. The probability density function for each arterial and venous model was plotted. The definite integral (DIICP) was calculated for all cases by estimating the definite integral ±1 cm water from the median of the predicted intracranial pressure. Computation of DIICP was confirmed by calculating the probability between the minimum and maximum bounds of the PDF, which returned a value of 1.00 with absolute error < 0.00012 for all cases. The Kolmogorov-Smirnov statistic (KS) and Anderson-Darling statistic (ADS) were used as a quantitative measure of the deviation of the PDF from a normal distribution, the former being more sensitive to the body and the latter to the tails of the distribution [19]. The two-sample Kolmogorov-Smirnov statistic (tsKS) was calculated as a measure of the magnitude of the distributional difference between arterial and venous PDFs. The larger the test statistic, the more significant the difference between the two distributions. The maximum distance between the empirical cumulative distribution functions was used to visualize the result [25]. The difference between median ICPs (DiffICPmd) was calculated as a measure of agreement between predicted and measured intracranial pressure. Multivariate analysis of variance (MANOVA) was used in the assessment of the statistical significance of the difference in the means between DiffICPmd and imaging variables, including test laterality, the levels of IOPi, the number of data points evaluated in each image, and the model type. Together with the central tendency distribution characteristics including standard deviation (sd), kurtosis (indicating tail weight and not peakedness) [26], and skew (corresponding to the first to the fourth moments). Additionally, DIICP, ADS, KS, distribution type as determined using Cullen-Frey graphs, and case subtype (cautionary subset/hold-out test). Shannon entropy, also known as information entropy, is a concept from information theory that quantifies the uncertainty, in this case, associated with a probability density distribution. Shannon entropy of the PDF was computed using the entropy library from R statistical package [27]. It is defined mathematically as:
Where:
H(x) is the entropy of the random variable x.
P(x) represents the probability of a specific outcome x in the distribution.
The summation (∑) is taken over all possible outcomes in the distribution.
Polychoric and Pearson correlations (for discrete and continuous variables, respectively) were used to examine the relationship between DiffICPmd on one side and imaging and distribution parameters on the other for the venous and arterial models separately. A p-value <0.05 was considered statistically significant for all tests. Fig 1 is the workflow schematic.
Two cautionary subset cases (8 and 9) were subsetted from seven hold-out test cases and demonstrated a wide and conflicting difference between measured and predicted intracranial pressure from the arterial and venous XGB models. All cases underwent data pre-processing, descriptive statistics, and hypothesis tests were computed from the retinal vascular pulse parameters. Intracranial pressure predictions were derived from the retinal arterial and venous parameters independently. Probability density functions were generated from intracranial pressure predictions from both XGB models, where the median was considered the most favorable compared to the mean and the mode. This was likely because the median represents the geometric mean of a log-normal distribution and is supported by findings from previous work [3]. Correlations were computed between the absolute difference between the median predicted and measured intracranial pressure (DiffICPmd) and imaging characteristics: (n = number of vascular data points analyzed, Bilateral = both eyes tested, nIOPi = number of induced intraocular pressure levels applied during imaging) and distribution characteristics: (DIICP = definite integral ± 1cm water of the median, tsKS = two-sample Kolmogorov-Smirnov statistic, ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic), sEnt = Shannon entropy.
Results
Modified photoplethysmography was performed bilaterally in five cases (10 eyes) and unilaterally in four cases (4 eyes). Both cases in the cautionary subset had unilateral imaging performed. Whereas the mean of DiffICPmd in the bilateral tested group was 3.07±0.56, this increased significantly in the unilateral tested group to 14.54±4.37, p<0.01. A total of 19,905 data points were sampled from the images of the study group: 7,617 arterial and 12,288 venous data points. The median measured intracranial pressure was 22 cm water, and the IQR was 10 cm water. The distribution was skewed to the left (skew = -0.278, kurtosis = 1.894). Therefore, the weight of the distribution of pulsation data features was at an ICP ≥22cm water.
The median, IQR of the HRWa in the hold-out test group was higher in the retinal veins (6.123, 0.38) compared to the retinal arteries (4.929, 0.44), p<0.0001. Further details of the Fourier coefficients subsetted by vessel type and study group are listed in Table 1. A violin plot comparing the statistical properties of the HRWa is demonstrated in Fig 2.
Comparing the distribution of the harmonic regression wave amplitude in the hold-out and cautionary subsets.
Table 2 and the ridgeline plot (Fig 3) provide a summary of the characteristics and probability density distributions for the predicted intracranial pressure for hold-out test and the cautionary subset (cases 8 and 9) cases. It can be observed that DiffICPmd for the latter two cases was at the higher limit for both XGB models compared to the hold-out test cases. The arterial DiffICPmd was (22.22, 37.09) and venous (15.01, 23.20) for cases 8 and 9, respectively. Of the parameters of the probability density distribution, the arterial DIICP in case 8 was 3.27% and case 9 was 4.39%, both lower (mean±standard error, 3.83±0.56%, sd = 0.792) than the DIICP computed from all hold-out test cases (14.14±1.07%, p<0.03, sd = 2.5). The venous DIICP in case 8 was 24.46%, and that of case 9 was 8.22%, both >2 standard deviations from the mean DIICP of the hold-out test cases, whereas the median (IQR) for sd, kurtosis, and skew in the hold-test cases was 7.852 (2.250), 4.845 (1.835), and 1.0850 (0.81), indicating a leptokurtic distribution (kurtosis>3), the values for the cautionary cases were 11.843 (4.861), 2.425 (1.1025), and -0.0450 (0.445) indicating a platykurtic distribution (kurtosis<3). When the distributions were superimposed as shown in Fig 4, it can be observed that cases with a close overlap of arterial and venous probability density distributions intuitively have close predictive outcomes (Table 2).
In contrast to the dominant right-skewed distribution in most cases, in cases 8 and 9, the distribution of the arterial predictions converges on a uniform distribution.
The two-sample Kolmogorov-Smirnov statistic (tsKS) provides a quantitative comparison between two distributions across the whole range rather than just a point estimate. Within a single case, the closer the approximation of the distributions from the arterial and venous models, the lower the tsKS value. Case 4 demonstrates the lowest tsKS statistic (0.080897, p<0.003), and case 7 is the highest (0.48101).
Ridgeline plot of the probability density distributions observed in Fig 3 is derived from the arterial and venous XGB models. It highlights the convergence on skewed distributions for most predictions. The level of concordance between arterial and venous distributions can be quantitatively estimated in Fig 4. The distributions are superimposed, and the tsKS was computed. The lower the value of tsKS, the higher the concordance between the distributions. For all cases, the value of tsKS was statistically significant. The exemplar case (Case 4) demonstrated the lowest tsKS statistic (0.080897, p<0.003), although Case 7 had the highest tsKS statistic (0.48101). This was offset by lower DiffICPmd (arterial 0.41, venous 7.56). For this case, both venous and arterial distributions were multimodal and converged on the broader family of beta distributions. Cases 8 and 9 showed tsKS statistics values towards the higher end (0.47723 and 0.3755, respectively), confirming the lack of concordance between arterial and venous probability density distributions (Fig 5).
(A-D) A comparison of the empirical cumulative distribution function of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm from four cases with contrasting two-sample Kolmogorov-Smirnov statistics from four cases. Two hold-out test cases demonstrating the lowest two-sample Kolmogorov-Smirnov statistic (tsKS). Cases 4 and 5 (A, B) demonstrate favorable concordance between venous and arterial derived predictions in contrast to the subset cases 8 and 9 (C, D), where the concordance is poor. The difference in separation of the ECDF between the two models can be observed in cases 8 and 9 (C, D). The tsKS statistic depends on a ratio parameter consisting of the product of the distribution data points divided by the sum [25]. Red = arterial model, Blue = venous model.
Multivariate analysis of variance (MANOVA) was used to evaluate the statistical significance of the relationship between DiffICPmd and both imaging and model output distribution characteristics. The results are summarized in Table 3. It can be observed that of the imaging characteristics, both laterality and the number of induced intraocular pressure levels (nIOPi). Interestingly the number of data points analyzed failed to achieve statistical significance. Of the single distribution parameters, those that defined the shape of the distribution (sd, skew, kurtosis, DIICP) demonstrated significant associations, as did the concordance measure for between distribution (tsKS) agreement. In contrast, parameters that indicated convergence upon a normal distribution (ADS, KS) were insignificant. These results suggest that bilateral modified photoplethysmography undertaken under multiple levels of induced intraocular pressure is more likely to generate probability density distributions with favorable characteristics.
The associations can be further clarified by evaluating the strength and direction of the correlations by computing the correlation statistic. This can be visualized in the correlation matrix in Fig 6. Five conclusions can be drawn from the matrix:
- The importance of bilateral imaging and recording the pulsation characteristics under multiple levels of IOPi was supported by a highly significant and negative correlation statistic (-0.59, -0.49).
- Except for (sd), which was significantly and positively correlated, distribution shape parameters (skew, kurtosis, DIICP) were all significantly and negatively correlated with DiffICPmd. The negative correlation indicates that a right-skewed narrower distribution with fewer outliers favors a more accurate XGB prediction.
- The higher the concordance between arterial and venous XGB distributions, the more accurate the prediction. The positive correlation is indicative that both parameters need to be smaller for more favorable predictions.
- Laterality was highly and negatively correlated with tsKS (-0.98) and moderately and positively correlated with DIICP (0.33). This result suggests that bilateral imaging was significant for analyzing the concordance measure (tsKS). In contrast, with DIICP, a weaker strength of association indicated that this measure was less affected by test laterality.
- A comparison of the four methods estimating randomness in the PDF, correlations with DiffICPmd decreased in the following order: lower order distribution moments (sd, skew, kurtosis), DIICP, tsKS, and sEnt.
Features from the top row are of interest. There was a significant negative correlation of DiffICPmd with laterality (bilateral was numerically coded as 2 and unilateral was 1 for this analysis) of -0.59. There was a moderate to low correlation with parameters of the distribution of the XGB-derived prediction (ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic). However, the correlation with DIICP = the definite integral ±1cm water of the median was strongly negative (-0.52), indicating that the higher the weight of the area under the curve within these bounds, the more accurate was the agreement between predicted and measured intracranial pressure. Similarly, the correlation with tsKS = two sample Kolmogorov-Smirnov statistic (0.49) was significant, indicating that the higher the overlap between the vascular model distributions, the higher the agreement with measured intracranial pressure. Comparably, Shannon entropy (sEnt) showed a strong positive correlation (0.48) indicating convergence to a uniform distribution (increased randomness) with higher DiffICPmd values.nIOPi = the levels of induced intraocular pressure applied during the imaging, n = total number of tested data points.
Shannon entropy varied over a narrow range for both models. The mean (range) for the arterial model was 5.577 (5.171–5.848) compared to the venous model 5.451 (5.353–5.592). The Pearson correlation between sEnt and DiffICPmd was 0.4832±0.1788 (p < 0.002). Although Pearson correlation did not achieve statistical significance when subsetted by vascular model, significance of sEnt as a discriminating parameter was suggested by distinguishing convergence of the PDF to a uniform distribution, where sEnt is maximised when the models were aggregated (Table 3).
Pearson correlations evaluating the relationship between DiffICPmd and DIICP showed a negative correlation between these two parameters (-0.5213, p< 0.004). The significance persisted for the arterial model only (0.75±0.20, p<0.007). In contrast, the venous model did not achieve statistical significance (0.79±0.17, p<0.58) for this correlation as demonstrated in Fig 7. Similarly, neither the mean KS nor ADS achieved statistical significance in either model (p<0.09). Other parameters of the XGB derived probability density, in particular the distribution type, sd, kurtosis, and skew, were significant in both the MANOVA model and in their polychoric correlations with DiffICPmd (p<0.003). These differences did not persist when tested by the vascular model subtype. This is possibly due to the limited size dataset. The two-sample Kolmogorov-Smirnov statistic showed a statistically significant and positive correlation (0.4942±0.18, p< 0.001) with DiffICPmd, however, the statistical significance was not sustained when tested for the arterial and venous models independently.
Pearson correlation between the definite integral ±1cm around the median of the probability density distribution (DIICP) and the absolute difference between predicted and measured intracranial pressure (DiffICPmd) for the A) arterial and B) venous models. Only A) the arterial model ( = -0.76, p = 0.02) achieved statistical significance, in contrast to B) the venous model (
= -0.10, p = 0.799). This indicated that the arterial model was a more discriminatory indicator of agreement between measured and predicted intracranial pressure.
While in the venous model, one-third of the predictions (33.33%) converged on a log-normal distribution, the dominant distributions in the arterial model were logistic and beta distributions (33.33% each) except for both arterial models of cases 8 and 9, which converged on a uniform distribution this finding indicated that the point estimate from the arterial prediction was no better than random in these two cases. Furthermore, this distribution contributed to both cases’ low arterial DIICP values. Notably, the venous and arterial distributions in case 8 demonstrated negative skews, a departure from the skews of all other cases. In case 9, the venous distribution converged on a heavy-tailed beta distribution, contributing to the low venous DIICP value in this case.
Discussion
Despite a limited-sized dataset precluding regression analysis, the results indicated that agreement between the arterial and venous models could be evaluated qualitatively by assessment of the overlap between the probability density distributions or quantitatively by computing the tsKS statistic. Additionally, measures of distribution moments consistent with a narrow-shaped leptokurtic distribution (narrow sd, positive skewness, high kurtosis, and high DIICP) supported favorable predictions. Quantifying the uncertainty in the PDF by estimating the Shannon entropy, could provide a further indication of convergence to a uniform distribution where entropy is maximised, hence indicating degradation of ICP predictions. However, lower correlation of sEnt compared to other measures can be observed. This may be due to its sensitivity to independence of probability events. If events are not independent the joint entropy can be less than the sum of the individual entropies thereby reducing the value for heavy tailed non-uniform probability density distributions.
Intuitively imaging both eyes and applying multiple IOPi improved the predictive outcome. A quantitative comparison of the two distributions by the tsKS test is highly advantageous as its performance is independent of the distribution type. It has been applied as a measure of model classification in several other studies [28–30]. Its limitations include restriction to continuous distributions and higher sensitivity near the center of the distribution than at the tails. Unlike the one-sample KS test, it can be performed under more general conditions that allow for discontinuity, heterogeneity, and dependence across samples [31]. Of the two cases that showed the lowest tsKS statistic, it is interesting to note that case 4 demonstrated high agreement with measured intracranial pressure (DiffICPmd 1.34–1.47 cm water). In contrast, Case 5 demonstrated a lower agreement (DiffICPmd 5.8–6 cm water). The distribution of intracranial pressures in the training set likely impacted the model performance. A recognized limitation of decision tree algorithms, particularly since case 5 had a measured intracranial pressure (17 cm water) located at the tail of the left skewed intracranial pressure distribution in the XGB model training dataset [3, 32]. This was also demonstrated in the cautionary subset cases (8 and 9), where measured intracranial pressure was at a lower range of (11 and 8 cm water, respectively). Moreover, the left-skewed distribution of measured intracranial pressure in the analyzed data set meant that the weight of the data points was in the body of the distribution ≥22 cm water (Fig 8). Hence, in the absence of sufficient training data, the model’s predictive accuracy was degraded at lower intracranial pressure ranges <22 cm water.
There is a low contribution of data points to the model at intracranial pressure levels <17 and >43 cm water with two participants below (K, M) and above (C, F) these boundaries, respectively. ICP = intracranial pressure in cm water [3].
The distribution shape parameters (skew, kurtosis, and DIICP) and DiffICPmd had a strong negative correlation. Therefore, a more positive skew, kurtosis, and higher DIICP were associated with a favorable agreement of predicted with measured intracranial pressure. The exception was with sd where a strong positive correlation with DiffICPmd was consistent with an accurate prediction. Given that the probability density distribution is normalized, it should exhibit a unitary integral (area under the probability density distribution should = 1). Therefore, a higher DIICP would equate to a lower dispersion of predictions around the median. Interestingly, DIICP correlated more strongly with the absolute difference between measured and predicted intracranial pressure compared to tsKS. A possible explanation is that DIICP depends on the convergence of one distribution and evaluates a narrow region of that distribution from which DiffICPmd is computed. This was in contrast to tsKS, which is dependent on the total area of two model distributions and, theoretically, more sensitive to the entire distribution range, including the tail of the distribution, which is the outliers location.
The convergence to a uniform distribution was associated with a lower agreement with measured intracranial pressure, particularly for the arterial model. The venous predictions approximate the ideal log-normal distribution more closely than the arterial predictions. The current results suggest that a transition through log-normal and related skewed distributions such as the beta family, gamma, or logistic distributions may occur before convergence to a uniform distribution in the venous system. The dominance of skewed and, particularly, the log-normal distribution is consistent with the results of the central limit theorem, which states that a random variable that is the sum of many independent variables or variables with weak interactions converges on a Gaussian distribution. Likewise, a random variable resulting from a multiplicative product with synergistic and strong interactions of several variables converges on a log-normal distribution, which does not violate the central limit theorem as a log-normal distribution converges to a normal distribution in the logarithmic domain [8, 33, 34]. The majority of interactions in highly interconnected systems, especially in biological systems, are multiplicative and synergistic rather than additive. Therefore, a log-normal distribution results from this interaction [33]. This may explain the dominance of log-normal and skewed distributions closely related to the log-normal distribution of intracranial pressures derived from the XGB algorithm. Strong non-linear dynamics dominated interactions between intracranial, intraocular pressure, and the retinal vascular pulse. In a recent publication, we used a linear mixed-effects model to correlate the harmonic amplitude distribution in the retinal vascular system with intraocular and intracranial pressure. This approach computed the variance of the model parameters, thereby quantitatively estimated their contribution to the pulsation dynamics at the optic disc in linear space. It was demonstrated that linear interactions of intraocular, intracranial pressure, and the retinal vascular pulse accounted for <9% of the variance. This was due to a non-constant variance (heteroscedasticity) of the retinal vascular pulse amplitude between individuals [35]. In previous work a generalized additive approach to define the geometry of the non-linear component of the interaction. This approach accounted for 49.21- 62.96% of the variance in the arterial and venous models, respectively [36]. Hence, non-linear synergistic dynamics between the retinal vascular pulse, intraocular, and intracranial pressure may account for the convergence on skewed distributions. This finding may also account for the outcome from the Bland-Altman analysis, where the highest agreement between measured and the median predicted intracranial pressure was observed in previous work. As the geometric mean of a log-normal distribution is equal to the median [3, 33], which therefore represents the ideal measure of central tendency for this type of skewed distribution.
Although the pulsation patterns and lumbar puncture measured intracranial in cases 8 and 9 were consistent with a normal intracranial pressure pattern, where venous pulsation amplitudes are higher compared to the arterial [35]. These cases highlight the challenges in non-invasive intracranial pressure estimation, where independent input variables can display a wide range of variance despite having similar intracranial pressures. Furthermore, the inconsistency in intracranial prediction in the latter two cases is also related to the inherent limitation of a decision tree algorithm. In contrast to a linear model, a recognized limitation of decision tree algorithms is its inability to extrapolate values by extension of the model boundaries. Therefore it may provide spurious results for values at the extremes of the training set values [32]. In both cases, 8 and 9 measured intracranial pressure were at the lower end of the normal range, and the Extreme Gradient Boost training set included only two subjects with a measured intracranial pressure over this range (Fig 8). Therefore, a more significant number of training examples would improve predictive accuracy for low intracranial pressure cases. However, a long observation period would be required based on the experience from the current study. The selective nature of modified photoplethysmography requires subject cooperation, which precludes imaging in a population of subjects with significant neurological disorders. Therefore, imaging was limited to participants with minimal neurological impairment. Moreover, lumbar puncture is invasive and unlikely to be performed based on a low clinical suspicion, further limiting case recruitment. Our research group is currently developing a handheld modified photoplethysmography solution to simplify data acquisition and address some of these restrictions.
The number of cases in this analysis was limited. A power analysis for medium effect size and statistical power of 80% showed that a population of 34 cases is required to confirm the DIICP as a discriminatory parameter and minimize the Type II error (failing to reject a false null hypothesis). However, the initial analysis showed that it could supplement the PDF and potentially serve as an individualized statistical parameter of reliability of the XGB approach. The statistical power of machine learning algorithms demands a large sample size in the training set [37]. This allows the inclusion of case variants and extension of the range of interactions between the model variables. Several factors determine the volume of required training data, including model complexity, the complexity of the learning algorithm, label features of interest, tolerable error margin, and variance in the input features. It remains unresolved how to best determine the sample size for a particular model when analyzing medical imaging data [38].
Differences in dynamic range in the pulsation parameters between the retinal veins and artery are highlighted both in Table 1 and Fig 2. These differences likely arise from structural and functional variability in the retinal vasculature and their characteristic interactions with pressurized anatomical chambers along their respective paths. The retinal arterioles lack an elastic lamina but possess a well-developed medial smooth muscle structure with 5–7 layers. In contrast, retinal veins have thinner walls (13.929±0.041 μm) compared to arteries (17.559 ±0.062 μm), and a thinner muscle layer of 3–4 layers that transitions to fibroblasts near the optic disc [39]. These structural differences influence the compliance and impedance characteristics of the retinal vasculature. While direct measurements of compliance and incremental modulus of elasticity (Einc) in the retinal vasculature are lacking, inferences from systemic vessels can be drawn, where veins exhibit higher compliance, with a sigmoidal pressure-volume curve compared to the curvilinear relationships seen in arteries [40–45]. These differences in compliance likely lead to differences in pulsation characteristics. Although physiological interactions between the retinal vasculature with pressurized anatomical chambers: intracranial, intraorbital, and intraocular spaces are yet to be fully understood. Our recent work highlights previously unrecognized and clinically significant interactions of the retinal arterioles with intracranial pressure, evidenced by the ability to generate intracranial pressure predictions from the arterial tree with an accuracy comparable to that of the retinal veins [3]. However, phase relationships with intracranial pressure remain unknown in this part of the retinal vascular system. In contrast, the intracranial pressure wave may dominate the pulse frequencies in the structurally thinner and higher compliance retinal venous wall, potentially explaining the higher venous Fourier coefficients, higher HRWa, and its attenuation characteristics [35, 46]. Observed differences in the phase of the retinal vascular pulse support this explanation, with experimental data showing phase congruence between retinal venous pulsation and intracranial pressure [47]. Future research in this field is recommended to shed light on the physiological interactions of the retinal vessels with anatomically related pressurized chambers.
In summary, the probability density distribution for the cautionary cases demonstrated unusual features. The arterial distribution in both cases converged on a uniform distribution, and the venous distribution of case 9 converged on a heavy-tailed beta distribution, all contributing to a low DIICP statistic. The venous and arterial distributions in case 8 showed negative skews, contrary to all other cases in this study. The unusual shape parameters in case 8 resulted in the DIICP statistic beyond two sd of the mean for both models. The probability density distribution shape parameters can therefore be significant indicators of the predictive accuracy of the machine learning model.
Conclusions
Despite a limited-sized dataset, the results support a dual and complementary analysis approach from independently derived retinal arterial and venous intracranial pressure estimates. Additionally, the probability density distribution of the ICP predictions should ideally converge on a log-normal or related positively skewed distribution (beta, gamma, logistic) with a low sd, and high kurtosis indicating fewer outliers. Interestingly, Shannon entropy had the lowest correlation with DiffICPmd, among the tested distribution features. A larger area under the probability density curve ±1 cm water (DIICP) and the higher the concordance between the arterial and venous distributions as indicated by a lower tsKS statistic may provide individualized accuracy parameters of the machine learning predictive outcome and further support a clinical decision-making algorithm.
References
- 1. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:e623. pmid:34307865
- 2. Barrett JP. The coefficient of determination—some limitations. Amer Statist. 1974;28(1):19–20.
- 3. Abdul-Rahman A, Morgan W, Yu DY. A machine learning approach in the non-invasive prediction of intracranial pressure using Modified Photoplethysmography. PLoS One. 2022;17(9):e0275417. pmid:36174066
- 4.
Severini TA. Joint distributions. In: Probability, statistics, and stochastic processes. John Wiley & Sons; 2012. p. 156–247.
- 5. Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth. 2019;22(1):67. pmid:30648682
- 6.
Sprent P. Location tests for single and paired samples. In: Data driven statistical methods. Routledge; 2019. p. 119–140.
- 7.
Johnson NL, Kotz S, Balakrishnan N. Continuous univariate distributions, volume 2. vol. 289. John wiley & Sons; 1995.
- 8. Limpert E, Stahel WA, Abbt M. Log-normal distributions across the sciences: keys and clues: on the charms of statistics, and how mechanical models resembling gambling machines offer a link to a handy way to characterize log-normal distributions, which can provide deeper insight into variability and probability-normal or log-normal: that is the question. BioScience. 2001;51(5):341–352.
- 9.
Severini TA. Properties of Probability Distributions. In: Elements of distribution theory. vol. 17. Cambridge University Press; 2005. p. 1–38.
- 10.
Carney M, Cunningham P, Dowling J, Lee C. Predicting probability distributions for surf height using an ensemble of mixture density networks. In: Proceedings of the 22nd international conference on Machine learning; 2005. p. 113–120.
- 11. Cavuoti S, Amaro V, Brescia M, Vellucci C, Tortora C, Longo G. METAPHOR: a machine-learning-based method for the probability density estimation of photometric redshifts. Mon Not R Astron Soc. 2017;465(2):1959–1973.
- 12.
Munakomi S, Das M. Intracranial Pressure Monitoring. StatPearls [Internet]. 2019;.
- 13. Bø SH, Lundqvist C. Cerebrospinal fluid opening pressure in clinical practice–a prospective study. Journal of Neurology. 2020;267:3696–3701. pmid:32681283
- 14. Jonas JB, Wang N, Yang D, Ritch R, Panda-Jonas S. Facts and myths of cerebrospinal fluid pressure for the physiology of the eye. Prog Retin Eye Res. 2015;46:67–83. pmid:25619727
- 15. Zacchetti L, Magnoni S, Di Corte F, Zanier ER, Stocchetti N. Accuracy of intracranial pressure monitoring: systematic review and meta-analysis. Crit care. 2015;19:1–8. pmid:26627204
- 16. Wang J, Yin MJ, Wen HC. Prediction performance of the machine learning model in predicting mortality risk in patients with traumatic brain injuries: a systematic review and meta-analysis. Bmc Med Inform Decis. 2023;23(1):142. pmid:37507752
- 17. van Hal S, van der Jagt M, van Genderen M, Gommers D, Veenland J. Using Artificial Intelligence to Predict Intracranial Hypertension in Patients After Traumatic Brain Injury: A Systematic Review. Neurocrit Care. 2024; p. 1–12. pmid:38212559
- 18.
Severini TA. Probability, statistics, and stochastic processes. John Wiley & Sons; 2012.
- 19. Razali NM, Wah Y. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. JOSMA. 2011;2(1):21–33.
- 20. Pettitt A. Cramér-von Mises statistics for testing normality with censored samples. Biometrika. 1976;63(3):475–481.
- 21.
Chakravarti IM, Laha RG, Roy J. Handbook of methods of applied statistics. Wiley Series in Probability and Mathematical Statistics (USA) eng. 1967;.
- 22. Stephens MA. EDF statistics for goodness of fit and some comparisons. J Am Stat Assoc. 1974;69(347):730–737.
- 23. Shannon CE. A mathematical theory of communication. The Bell system technical journal. 1948;27(3):379–423.
- 24. Chen J, Wall M. Epidemiology and risk factors for idiopathic intracranial hypertension. Int Ophthalmol Clin. 2014;54(1). pmid:24296367
- 25. Feller W. On the Kolmogorov-Smirnov limit theorems for empirical distributions. Ann Math Stat. 1948;19(2):177–189.
- 26. Westfall PH. Kurtosis as peakedness, 1905–2014. RIP. Amer Statist. 2014;68(3):191–195.
- 27.
Hausser J, Strimmer K. entropy: Estimation of Entropy, Mutual Information and Related Quantities; 2021. Available from: https://CRAN.R-project.org/package=entropy.
- 28. Wang F, Wang X. Fast and robust modulation classification via Kolmogorov-Smirnov test. IEEE. 2010;58(8):2324–2332.
- 29.
Blachnik M, Duch W, Kachel A, Biesiada J. Feature Selection for Supervised Classification: A Kolmogorov-Smirnov Class Correlation-Based Filter. In: AIMeth, Symposium On Methods Of Artificial Intelligence. Gliwice, Poland (10-19 November 2009); 2009.
- 30. Fadlallah B, Seth S, Keil A, Principe J. Quantifying cognitive state from EEG using dependence measures. IEEE. 2012;59(10):2773–2781. pmid:22851234
- 31. Massey FJ Jr. The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc. 1951;46(253):68–78.
- 32. Gao K, Yang Y, Zhang T, Li A, Qu X. Extrapolation-enhanced model for travel decision making: an ensemble machine learning approach considering behavioral theory. Knowl-Based Syst. 2021;218:106882.
- 33. Buzsáki G, Mizuseki K. The log-dynamic brain: how skewed distributions affect network operations. Nat Rev Neurosci. 2014;15(4):264–278. pmid:24569488
- 34. Mitchell RL. Permanence of the log-normal distribution. J Opt Soc Am. 1968;58(9):1267–1272.
- 35. Abdul-Rahman A, Morgan W, Jo Khoo Y, Lind C, Kermode A, Carroll W, et al. Linear interactions between intraocular, intracranial pressure, and retinal vascular pulse amplitude in the Fourier domain. PLoS One. 2022;17(6):e0270557. pmid:35763528
- 36.
Abdul-Rahman A. Retinal Vascular Pulse Wave Analysis in the Fourier Domain. PhD Thesis. 2023;.
- 37. Raudys SJ, Jain AK, et al. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on pattern analysis and machine intelligence. 1991;13(3):252–264.
- 38. Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, et al. Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J. 2019;70(4):344–353. pmid:31522841
- 39. Hogan MJ, Feeney L. The ultrastructure of the retinal blood vessels: I. The large vessels. J Ultrastruct Res. 1963;9(1-2):10–28. pmid:14065353
- 40. Nichols WW, Edwards DG. Arterial elastance and wave reflection augmentation of systolic blood pressure: deleterious effects and implications for therapy. J Cardiovasc Pharmacol Ther. 2001;6(1):5–21. pmid:11452332
- 41. Moreno AH, Katz AI, Gold LD, Reddy R. Mechanics of distension of dog veins and other very thin-walled tubular structures. Circ Res. 1970;27(6):1069–1080. pmid:5494860
- 42.
Keener J, Sneyd J. The circulatory system. In: Mathematical Physiology II: Systems Physiology. New York: Springer-Verlag; 2009. p. 471–522.
- 43.
Feher JJ. Vascular function: Hemodynamics. In: Feher JJ, editor. Quantitative Human Physiology: An introduction. Cambridge, MA, United States: Elsevier-Academic Press; 2012. p. 498–507.
- 44.
Burton AC. Arrangements of Many Vessels. In: Physiology and biophysics of the circulation: an introductory text. 2nd ed. Chicago, United States: Year Book Medical Publishers; 1972. p. 51–62.
- 45.
Caro C, Pedley T, Schroter R. The systemic veins. In: The mechanics of the circulation. Cambridge, United States: Cambridge University Press; 2012. p. 426–466.
- 46. Abdul-Rahman A, Morgan W, Yu DY. Measurement of normal retinal vascular pulse wave attenuation using modified photoplethysmography. PLoS One. 2020;15(5):e0232523. pmid:32379837
- 47. Morgan WH, Lind CR, Kain S, Fatehee N, Bala A, Yu DY. Retinal Vein Pulsation Is in Phase with Intracranial Pressure and Not Intraocular Pressure. Invest Ophthalmol Vis Sci. 2012;53(8):4676–4681. pmid:22700710