Semiparametric maximum likelihood probability density estimation

doi:10.1371/journal.pone.0259111

Table 1.

Overview of domains and basis functions for semiparametric density estimation.

More »

Expand

Table 2.

Overview of density estimators.

See S3 Appendix for details.

More »

Expand

Table 3.

Test cases of probability densities: Densities 1–12 are supported on (−∞, ∞), density 13 on [−2, 2], density 14 on [0, 1] and density 15 on [0, ∞).

N(μ, σ²) denotes the normal density with mean μ and standard deviation σ, Beta(α, β) denotes the beta density with shape parameters α and β, and Gam(α, β) denotes the gamma density with shape parameter α and rate parameter β.

More »

Expand

Table 4.

Mean integrated squared error (scaled by 10⁴) estimated as the average over 100 realisations for the test cases in Table 3.

For test cases 14 and 15, the semiparametric estimators allow for logarithmic boundary terms at any boundary point. The best method is indicated in bold. For test cases 9 and 10, the diffusion estimators are unstable; for test cases 4, 8 and 11, they give reasonable density estimates in most realisations but are unstable in some realisations. See S3 Appendix for further explanation.

More »

Expand

Table 5.

Detection of modality: Number of realisations out of 100 in which the modality of the density is correctly identified.

The upper row refers to modes, the lower row to bumps. Modes and bumps are counted on the indicated interval. For test cases 14 and 15, the semiparametric estimators allow for logarithmic boundary terms at any boundary point. The best method is indicated in bold. For test case 8, the diffusion estimators give reasonable density estimates in most realisations but are unstable in some realisations. See S3 Appendix for further explanation.

More »

Expand

Fig 1.

Test case 2: Example of density estimates obtained from the kernel density and diffusion estimators (left) as well as from the semiparametric and local likelihood estimators (right).

The SPDE is the model SPL2 with 2 knots after knot deletion. The sample size is N = 2000.

More »

Expand

Fig 2.

Test case 6: Example of density estimates obtained from the kernel density and diffusion estimators (left) as well as from the semiparametric and local likelihood estimators (right).

The SPDE is the model SPL2 with 3 knots after knot deletion. The sample size is N = 2000.

More »

Expand

Fig 3.

Test case 6: Left: Example of model selection, corresponding to the example shown in Fig 2.

Sample size is N = 2000. Right: Relative frequency out of 250 realisations of picking a particular model as the SPDE. The spline models encompass models without and with knot deletion.

More »

Expand

Table 6.

Test case 14: Number of realisations out of 100 in which certain combinations of logarithmic boundary terms are chosen for the various semiparametric estimators and different sample sizes.

More »

Expand

Table 7.

Test case 15: Number of realisations out of 100 in which no boundary term or a logarithmic boundary term is chosen for the various semiparametric estimators and different sample sizes.

More »

Expand

Table 8.

Effect of boundary terms on model accuracy: Mean integrated squared error (scaled by 10⁴) estimated as an average over 100 realisations.

The upper row refers to models without boundary terms, the lower row to models allowing for logarithmic boundary terms at any boundary point.

More »

Expand

Table 9.

Boundary bias (scaled by 10⁴) estimated as the mean over 100 realisations.

The true density at the boundary is f_X(−2) = 0.0427 and f_X(2) = 0.1450 for test case 13, f_X(0) = f_X(1) = 0 for test case 14, and f_X(0) = 0 for test case 15. For the semiparametric estimators and test cases 14 and 15, the upper row refers to models without boundary terms and the lower row to models allowing for logarithmic boundary terms at any boundary point. The best method is indicated in bold.

More »

Expand

Fig 4.

Test case 15: Example of density estimates from selected methods (left) and corresponding model selection for the SPDE (right).

Only the best spline model after each knot deletion step is indicated to increase the readability of the plot. The SPDE is the model SPL2 with 4 knots after knot deletion augmented with a logarithmic boundary term. The sample size is N = 2000.

More »

Expand

Fig 5.

Old Faithful Geyser eruption data: Density estimates of the eruption duration (left) and waiting time (right) from various methods.

For the duration data the SPDE is the model with potential function U(y) = α₁log y + α₂(1/y) + α₃ y + α₄ y²; for the waiting time data it is the model SPL2 with 1 knot after knot deletion.

More »

Expand

Table 10.

Old Faithful geyser data: Bayesian information criterion, cross-validated log-likelihood, number of modes and number of bumps for various density estimators.

The number of modes and bumps is counted on the interval [1.25, 5.5] for the eruption duration data and on the interval [35, 100] for the waiting time data. The best method is indicated in bold.

More »

Expand