Updated fiducial distribution of parameters in the associated delta-lognormal population

Yufan Wang; Xingzhong Xu

doi:10.1371/journal.pone.0298307

Abstract

In this paper we consider a special kind of semicontinous distribution. We try to concern with the situation where the probability of zero observation is associated with the location and scale parameters in lognormal distribution. We first propose a goodness-of-fit test to ensure that the data can be fit by the associated delta-lognormal distribution. Then we define the updated fiducial distributions of the parameters and establish the results that the confidence interval has asymtotically correct level while the significance level of the hypothesis testing is also asymtotically correct. We propose an exact sampling method to sample from the updated fiducial distribution. It can be seen in our simulation study that the inference on the parameters is largely improved. A real data example is also used to illustrate our method.

Citation: Wang Y, Xu X (2024) Updated fiducial distribution of parameters in the associated delta-lognormal population. PLoS ONE 19(6): e0298307. https://doi.org/10.1371/journal.pone.0298307

Editor: Jiangtao Gou, Villanova University, UNITED STATES

Received: August 8, 2023; Accepted: January 17, 2024; Published: June 5, 2024

Copyright: © 2024 Wang, Xu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: One author received the National Natural Science Foundation of China: 11471035 and 11471030, URL: https://www.nsfc.gov.cn The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

In real applications, such as fisheries research and medical cost analysis, the response variables may be skewed, non-negative and have a non-negligible probability of zero outcomes. These variables are also known as following the semicontinous distribution where G(x) is the cumulative distribution of a postive variable. In existing researches, δ is usually assumed to be independent of G(x). However, we think that the probability δ is associated with G. For example, consider the precipitation distributions of some certain areas, the areas with larger rainfall per year are more likely to have less dry days. Hence, we can assume that δ is associated with G, say δ = G(a), for some a. In this paper, we try to deal with such assumption by a specified distribution, the delta lognormal distribution. This kind of distributions is first discussed and named by [1]. The cumulative distribution function of delta-lognormal distribution is then defined as follow where denotes the cumulative distribution function of a lognormal distribution. Since log X follows a normal distribution N(μ, σ), we still refer to μ and σ as the location and scale parameters respectively in the rest of our paper.

[2] applied this distribution to deal with the measurement of worker exposure to air contaminants in United States. The use of delta-lognormal distribution to fisheries research was done by [3–5]. They considered the estimates of the population mean of the delta-lognormal distribution and further studied their robustness. It is easy to calculate the mean of the delta-lognormal distribution as Much attention is given to the confidence interval of M by various statisticians. [6] proposed to use the likelihood ratio test to get a better control of the Type I error than the former standard ANOVA F-test and Kruskal-Wallis test. A Bootstrap approach is proposed which is proved to be second-order accurate in [7]. [8] considered the case when at least two non-zero observations are observed and modified the profiled loglikelihood function. [9, 10] used the generalized pivotal quantities proposed by [11] to construct a generalized pivot for estimating the mean. In their paper, a Beta distribution is used as the generalized pivot for δ. This thought is further developed by [12–14]. In the papers mentioned above, generalized pivot quantities are proposed for the binomial variable, which is discrete. Meanwhile, the conclusion of [15] on generalized fiducial inference also motivates some new ideas. The recent results are shown in [14], where the authors focus mainly on the improvement of the Beta distribution to approximate the generalized fiducial distribution of δ.

Instead of finding a generalized fiducial distribution, another method is proposed by [16], named “method of variance estimates recovery”(MOVER). This method can be easily applied to many different settings while guarantees the coverage probability of the confidence interval. From the Bayesian perspective, [17] compared the performance of using different prior distributions for both lognormal distribution and delta-lognormal distribution. They further considered the comparison of the means of two lognormal population.

As we can see from the introduction above, the three parameters in delta-lognormal distribution is assumed to be independent. However, in real applications, the probability of zero outcomes may be associated with the location and scale parameters. Consider the case of the spend on children’s clothing in [1], a family in a rich community is more likely to be a spender, while the one in a relative poor community may be a nonspender, since it is easy to be influenced by other families in the same community. It is natural to assume that the probabiliy of the nonspender in a community with large μ and σ may be smaller than that of a community with small μ and σ. Similar cases illustrate that in real applications, δ may depend on the other two parameters. We refer to this special kind of distribution as an associated delta-lognormal distribution. Thus, we can learn information about μ and σ from both the nonzero observations and the number of zero observations.

Assume that δ is a known function of μ and σ. The unknown parameters of associated delta-lognormal distribution thus become (μ, σ). In this paper we will give the fiducial distributions and infer on the parameters. The idea is that we first obtain the fiducial distributions from the nonzero observations and then update them using the number of nonzero observations which follows a binomial distribution whose success rate is δ(μ, σ). The approach of updating is motivated by the Bayes theorem. The fiducial distributions of (μ, σ) from the nonzero observations is regarded as the “prior distribution”, and is combined with the binomial distribution to get the “posterior distribution”, which is referred to as the updated fiducial distribution. We further infer on μ, σ and functions of them by this updated fiducial distribution. The updated fiducial distributions of (μ, σ) are not derived from some statistics which are asymptotically normal. The asymptotically results of fiducial distribution given by [15] are no longer applicable here. Coincidentally, the updated fiducial distribution is the posterior distribution under the prior 1/σ. We show that this updated fiducial distribution enjoys the Bernstein-von Mises theorem. Then we show that the marginal fiducial distributions of the parametric functions are asymptotic confidence distributions defined in [18]. Therefore, the confidence intervals of the parametric functions have asymptotically correct confidence levels. The significance levels of the hypothesis testings are also asymptotically correct. To deal with the computation, we employ the reject-sampling motivated by the approximate Bayesian computation method, see [19–21]. Though there are some more superior sampling methods, our method is still promising benifits from its simplicity and exactness. We show in simulation study that our inference can be largely improved, due to the combination of the continous and discrete data.

The rest of the article is organized as follows. In Section 2, we introduce the associated delta-lognormal distribution and propose the updated fiducial distribution of the parameters. We further present approaches of confidence interval estimation and hypothesis testing of the parameters. Their frequentist properties are also given. We conduct simulations in Section 3 and use a real data example to illustrate our method in Section 4. We give our conclusion in the last section.

2 Methodology: Associated delta-lognormal distribution

In the articles mentioned earlier, three parameters in delta-lognormal distribution are always assumed to be independent. In this section, we consider the case when δ is associated with θ = (μ, σ). We assume that delta is a function of the location and scale parameters, denoted by δ(μ, σ). This means that an observation in the sample generated from the distribution may be 0 with probability δ(μ, σ) and the nonzero observations should follow a lognormal distribution with parameters μ and σ, which is denoted by LN(μ, σ). The cumulative distribution function of the associated delta-lognormal population is where F_LN(x; μ, σ) is the cumulative distribution function of LN(μ, σ).

A sample from this population is denoted by X = (X₁, X₂, ⋯, X_n). We assume that N₀ observations are zero while the rest N₁ = n − N₀ ones are nonzero. The likelihood function for the number of zero observations N₀ can be given as (1) where n₀ is the observation of N₀, n₁ = n − n₀.

2.1 Updated fiducial distribution

Without loss of generality, we assume that the first N₁ observations are nonzero, while the rest are 0, that is, . Given N₁ = n₁, the nonzero observations are from LN(μ, σ). Let n₁ ≥ 2. A log-transformation is made to the observations, Y_i = log X_i for i = 1, 2, ⋯, n₁. Then the sample mean and variance follow a normal and χ²(n₁ − 1) distribution respectively, that is, Let U ∼ N(0, 1) and V ∼ χ²(n₁ − 1) be two independent random variables. Then we have Given and S² = s², then μ and σ can be regarded as the functions of U and V The joint distribution of (U, V) is Then the joint distribution of (μ, σ) can be calculated as (2) where .

This means that the fiducial distribution of (μ, σ) is (3)

If n₁ < 2, we take (4) where when n₁ = 0 and when n₁ = 1. Then the fiducial density π^F(μ, σ|x_obs) in (2) is obtained for all n₁ ≥ 0.

The fiducial distributions for lognormal distribution is first given by [22]. However, there is no common fiducial distribution for binomial variable. A generalized fiducial quantity is proposed by [15], which is a Beta distribution Beta(n₀, n₁ + 1). Other improvements made on the parameter of the Beta distribution is further proposed by [12, 14], which are 0.5[Beta(n₀, n₁ + 1) + Beta(n₀ + 1, n₁)] and Beta(n₀ + 0.5, n₁ + 0.5), respectively.

Now we consider the problem from the Bayesian perspective, without the need of using generalized fiducial quantities. In Bayesian inference, the prior beliefs about the model parameters θ, say π(θ), are updated by observing data y_obs through the likelihood function of the model. We denote the likelihood function by p(y_obs|θ) and use the Bayes’ theorem to get the posterior distribution (5)

The prior distribution is often specified by choosing some tractable distributions that we believe the parameters should obey. For associated delta-lognormal distribution, the prior distributions of (μ, σ) are naturally chosen to be the fiducial distributions (2), and is further updated by the likelihood function (1). We define the updated fiducial distribution of (μ, σ) as (6) where “∝” means “proportion to”.

2.2 Goodness-of-fit test

Let the observation be x₁, x₂, ⋯, x_n. We take δ = G(x₀), where x₀ is a preset value and G is the cumulative distribution function of the continuous part. In this paper, we consider the case when G is the lognormal distribution, then

In real applications, x₀ maybe known. For example, in Tobit model, see [23], then x₀ = y_min. When x₀ is unknown, we can obtain x₀ with the following method.

Let n₀ and n₁ be the numbers of zero and nonzero observations, respectively. Without loss of generality, let be the nonzero ones. Then μ and σ are estimated by Then Let , then Thus, the associated delta can be given by

To test the goodness-of-fit, the classical Kolmogorov-Smirnov test is no longer suitable in the zero-inflated model. We consider using the Pearson’s chi-square test. The following partition is made on the internal [0, ∞), which is 0, (0, a₁], (a₁, a₂], ⋯, (a_k, ∞). Let where a₀ = 0. Then p₀, p₁, ⋯, p_k are estimated by Let m_i be the number of samples in the interval (a_i, a_i+1), where a_k+1 = ∞. We can then construct the following test statistic, (7) Then Given the significance level α, the model of associated delta-lognormal distribution is accepted when

2.3 Inference on functions of parameters

Assume that (μ, σ) follows the updated fiducial distribution π^UF(μ, σ|x_obs). Let G = g(μ, σ) which is a random variable. Then we denote the marginal fiducial distribution of g(μ, σ) by and the cumulative distribution function of by .

Confidence interval.

The confidence interval of g(μ, σ) with confidence level 1 − α is given by (8) where , 0 < γ < 1, satisfies

Hypothesis testing.

For the one-sided hypothesis The p-value is defined as (9) For the two-sided hypothesis The p-value is then given by (10)

Now we start to investigate the frequenist properties of the confidence interval and the hypothesis testing. First we define the random variable Z_i as where i = 1, 2, ⋯, n. Then (Z₁, X₁), (Z₂, X₂), ⋯, (Z_n, X_n) are independently identically distributed as f(z, x; μ, σ) given below. The population sample space is then and the dominating measure , where is the counting measure on {0, 1} and LN(0, 1) is the standard log-normal distribution, which has the density as When x = 0, we define the function above as the limit 0.

The density f(z, x; μ, σ) with respect to ν is (11) where , θ = (μ, σ) ∈ Ω = (−∞, ∞) × (0, ∞).

We first check that f(z, x; μ, σ) is a probability density function. It can be seen that when Z = 1, X = 0, the density is while when Z = 0, X = x, the density becomes

Then we integrate f(z, x; μ, σ) on with respect to ν This indicates that f(z, x; μ, σ) is a density function with respect to ν.

Then we show that the family (11) is quadratic mean differentiable, which is defined below.

Definition 1 (Quadratic Mean Differentiable) The family {P_θ, θ ∈ Ω} is quadratic mean differentiable at θ₀ if there exists a vector of real-valued functions such that, as θ → θ₀,

To verify that a family is quadratic mean differentiable, a lemma below is used in this paper.

Lemma 1 ([24]). For every θ in an open subset of R^k, let p_θ be the propbability density. Assume that the map is continuously differentiable for every x. If the elements of the Fisher information matrix I_θ are well defined and continuous in θ, then the density p_θ is quadratic mean differentiable.

Hence we can establish the following proposition.

Proposition 2 Assume that 0 < δ(μ, σ) < 1 and δ(μ, σ) is continously differntiable for all −∞ < μ < + ∞ and σ > 0. Then the density f(z, x; μ, σ) is differentiable in quadratic mean.

The proof of this propostion is given in S1 File.

Given the observation (z₁, x₁), ⋯, (z_n, x_n), we have the likelihood function as

Notice that when n₁ ≥ 2, the updated fiducial distribution has the form where y = log x. With simple calculation we can get This means that the updated fiducial distribution can be regarded as a posterior distribution under the prior distribution 1/σ.

When n → ∞, (12) Therefore we can apply the famous Bernstein-von Mises Theorem below to the updated fiducial distribution.

Lemma 3 (Bernstein-von Mises Theorem, [24]) Let the experiment (P_θ : θ ∈ Ω) be differntiable in quadratic mean at θ₀ with nonsigular Fisher information matrix , and suppose that for every ε > 0 there exists a sequence of test ψ_n such that Furthermore, let the prior measure be absolutely continuous in a neighborhood of θ₀ with a continuous positive density at θ₀. Then the corresponding posterior distributions satisfy (13)

At the moment we explain notations in (13). The symbol is the posterior density of while is a normal distribution with mean and variance . The norm ‖f − g‖ means which is the L¹ distance between densities f and g. Thus we can obtain the result below.

Theorem 4 Under the assumptions of Proposition 2, Bernstein-von Mises theorem holds when the posterior distribution is replaced by the updated fiducial distribution π^UF(μ, σ|x_obs).

The proof of Theorem 4 is given in S1 File.

To explore the frequenist properties of the functions of parameters under updated fiducial distribution, we give the definitions of the confidence distribution and asympototic confidence distirbution, which were proposed by [18].

Definition 2 A function H_n(⋅) = H_n(X_n, ⋅) on is called a confidence distribution for a parameter θ if (i) for each given , H_n(⋅) is a continuous cumulative distribution function; (ii) at the true parameter value θ = θ₀, H_n(⋅, θ₀) = H_n(X_n, θ₀), as a function of the sample X_n, has the uniform distribution U(0, 1). The function H_n(⋅) is called asymptotic confidence distribution if requirement (ii) above is replaced by (ii)’ : at θ = θ₀, as n → + ∞, and the continuity requirement on H_n(⋅) is dropped.

The notation “” means convergence in distribution.

Given n₁ ≥ 2, under the fiducial distribution (2), it is well known that the marginal fiducial distributions of μ and σ are confidence distributions. However, under the updated fiducial distribution (6), the fiducial distributions (2) are updated by the discrete variable N₁. Thus the marginal fiducial distributions are no longer confidence distributions. Except for μ or σ, we consider some functions of them. We have the following theorem.

Theorem 5 Let g(μ, σ) = K(aμ + bσ), where K is a strictly monotone increasing function. Then under the assumptions of Propostion 2, the marginal updated fiducial distribution of g is an asymptotic confidence distribution.

The proof of Theorem 5 is given in S1 File.

Apply this theorem to different functions g(μ, σ), we can get the corollary below.

Corollary 6 The marginial updated fiducial distributions of the following functions are all asymptotic confidence distributions:

(i) g₁(μ, σ) = μ;
(ii) g₂(μ, σ) = σ;
(iii) g₃(μ, σ) = exp[μ + Φ⁻¹(γ)σ], the γ quantile of LN(μ, σ);
(iv) g₄(μ, σ) = Φ[(log x₀ − μ)/σ], the cumulative distribution fucntion of LN(μ, σ) at x₀;
(v) g₅(μ, σ) = (1 − δ(μ, σ)) exp (μ + σ²/2), the population mean, when (14)

The proof of Corollary 6 is given in S1 File.

An example to (v) in Corollary 6 is δ(μ, σ) = Φ(−μ/σ). We can see that which satisfies (14). The following proposition guanrantees the level of both the confidence interval and the hypothesis testing.

Proposition 7 If the marginal updated fiducial distribution of g(μ, σ) is an asymptotic confidence distribution. Then the level of the confidence interval is asymptotically 1 − α. The significance level of hypothesis testing is asymptotically α.

The proof of Proposition 7 is given in S1 File.

From Propostion 7, if g(μ, σ) is taken as in Theorem 5 or Corollary 6, the confidence intervals in (8) and the p-values in (9) and (10) are asymptotically correct when n → ∞. When the sample size n is moderate, we give simulations in next section.

2.4 Sampling from the updated fiducial distribution

To give the confidence intervals of the parameters, we need to compute the γ-quantiles of the updated fiducial distributions. Similarly, to give the p-values of the hypothesis testing, we need to compute the cumumlative distribution functions of the marginal updated fiducial distirbutions at g₀. However, it is difficult to give the closed forms of them. Fortunately, we can adopt a simple method to produce accurate sample from the updated fiducial distribution, which is known as the reject sampling method.

We can draw parameters from the “prior distribution” and accept the ones that generate the same number of zero as the observed data. This is similar to the reject-ABC method proposed first by [25, 26]. However, it shall be noticed that there is no approximation error in our sampling method for associated delta-lognormal distribution, since we don’t use summary statistics and accept only the parameters which generate the same number of zero. Thus the parameters we accepted are equavilent to sampling from the real posterior distribution (6).

Without loss of generality, assume first that the observation of sample size n is , where x_i > 0 for i = 1, ⋯, n₁ and the rest n₀ = n − n₁ ones are zero. A log-transformation is then made to the nonzero observations . Then the fiducial distributions of μ and σ is given by (3), which are (15) where U is the standard normal random variable while V is a χ²(n₁ − 1) random variable.

Log-transformation is made on the nonzero observation, which is denote by . The sample mean and sample variance are calculated and denoted by and s².
If n₁ ≥ 2, sample U from the standard normal distribution and V from the distribution, respectively. To sample from the fiducial distribution of the parameters, we simply calculate μ and σ² using (15). If n₁ < 2, we draw samples from (4).
Calculate δ = δ(μ, σ) and draw samples from a binomial distribution B(n, δ(μ, σ)). We accept the parameters if the number of zero equals to n₀.
The process is repeated until we accept a certain number of parameters.

With the sample from the updated fiducial distribution, we then consider the inference on the scalar function g(μ, σ). We first assume that a certain number, say N, parameters are accepted using reject sampling method. We denote these parameters by (μ₁, σ₁), (μ₂, σ₂), ⋯, (μ_i, σ_i), ⋯, (μ_N, σ_N). For the function G = g(μ, σ), let g_i = g(μ_i, σ_i), i = 1, 2, ⋯, N.

Confidence interval.

The confidence interval (8) of g(μ, σ) can be computed as follow. We sort in ascending order Then we take (16) where [a] is the largest integer not larger than number a.

Hypothesis testing.

The first hypothesis is testing whether (μ, σ) is in a nondegenerate region. This means that the null hypothesis is (μ, σ) ∈ Ω₀ where Ω₀ ⊂ ℜ × ℜ⁺. To test this hypothesis, we simply calculate the ratio of (μ_i, σ_i) contatining in Ω₀ as follow and denote this value as the p-value where ^#A means the number of set A.

We also consider testing the null hypothesis H₀ : θ = θ₀ versus H₁ : θ ≠ θ₀. The p-value under the null hypothesis is then where Thus we reject the null hypothesis when the p-value is not larger than a given level α.

3 Simulation study

In this section we illustrate the performance of our confidence intervals and hypothesis testing when the sample size is moderate. We take δ(μ, σ) = Φ((a − μ)/σ). Without loss of generality, we take a = 0. Otherwise, for nonzero observation X_i, let Y_i = X_ie^−a. Then log Y_i ∼ N(μ−a, σ²), which means that we take μ − a as the new location parameter. So we consider δ(μ, σ) = Φ(−μ/σ). We can sample from the updated fiducial distribution using the method we proposed. Three simulation studies are conducted in this section. The first simulation study shows the interval estimates of the parameters in associated delta-lognormal distribution, we compare this with that of the fiducial distributions to illustrate the improvements. In the second simulation study, we compare the estimates of population mean of the associated delta-lognormal and the traditional one when δ = 0.5. In the last simulation study, we focus on the estimation and hypothesis testing for δ in associated delta-lognormal distribution, we also compare the result with that of traditional one.

3.1 Simulation study I

In this simulation study we consider the estimate of μ and σ. The sample sizes considered are n = 20, 30, 50 and 100. We set the value of σ to 0.5, 1 and 2 while the value of μ is changed to make the corresponding δ = Φ(−μ/σ) approximately equal to 0.6, 0.5, 0.4, 0.3 and 0.15. Particularly, when σ = 1, the corresponding values of μ are −0.25, 0, 0.25, 0.5 and 1. For each parameter setting, we generate 1000 repetitions and for each one we sample n = 4000 pairs of (μ, σ) and calculate the 95 percent confidence intervals of μ and σ using the Eq (16). We compare the results with the estimates obtained from the fiducial distribution and the results are shown in the Figs 1 and 2, where the details are given in the Tables in S1 File. The figures are confidence intervals of μ and σ when σ = 1 and n = 20, 30, 50 and 100. The horizontal coordinates are the values of μ = −0.25, 0, 0.25, 0.5 and 1, while the vertical coordinates are the confidence intervals of μ or σ for different μ. The four plots from left to right and from top to bottom denote the cases when n = 20, 30, 50 and 100, respectively. The plot of σ = 1 and 2 are quite similar with that of 0.5, thus we don’t put the figures in our context.

Download:

Fig 1. The confidence intervals of μ when μ = −0.25, 0, 0.25, 0.5, 1, σ = 0.5 and n = 20, 30, 50, 100.

https://doi.org/10.1371/journal.pone.0298307.g001

Download:

Fig 2. The confidence intervals of σ when μ = −0.25, 0, 0.25, 0.5, 1, σ = 0.5 and n = 20, 30, 50, 100.

https://doi.org/10.1371/journal.pone.0298307.g002

For this specific δ, we can see that the estimate of μ is largely improved. The lower limits of μ become larger compare to the fiducial distribution while the upper limits are getting smaller. This leads to a significantly smaller confidence interval while retain the coverage probability. However, the impact on σ is not apparently as μ. The average length of the confidence intervals for σ generally get smaller than that of fiducial distribution, with the decreasing of δ and sample size n. The lower limits seems to be always bigger than that of the fiducial distribution while the upper limits gradually become smaller as δ and the sample size increase. We also notice that the distribution of σ is asymmetric, so we suggest to use the 2% and 97% quantile of the sampled σ to construct the 95% confidence interval of σ.

3.2 Simulation study II

In this simulation we consider the inference on the log population mean of the associated delta-lognormal distribution which has the form (17) The population mean of the delta lognormal distribution plays a crucial role in statistical analysis and inference. It is a measure of central tendency, providing a summary of the central location of the distribution. For example, in the real data of our paper, we estimate the diagnostic test charges of the patients. The true value of the parameters and the sample sizes are set as we did in the last simulation. We first consider the point estimate of the log population mean, the “posterior mean” and the “posterior median” are considered, the former is approximated by (18) while the latter is approximated by the 0.5 quantile of the N accepted values. We compute the mean bias and the mean squared error of these two estimates and compare with that of [14]. To obtain the estimate of Krishnamoorthy, we compute the mean of “Qtheta” in his paper. The result of the case when σ = 1 is shown in Table 1, the ones for σ = 0.5 and 2 can be found in the S1 File. “MB”, “MDB” and “GQB” stand for the mean bias of the posterior mean, posterior median and the estimate using the genralized quantity in [14]. “MSE” stands for the mean squared error and the subscripts indicate the three estimate. We also use Fig 3 for a better view of the two estimates. It should be noticed that some extreme cases may occur when δ is large and the sample size is small, as is shown in the first plot of Fig 3 where n = 20. In these extreme cases, there are only three or less nonzero observations, making the estimates far from the true value, thus the mean bias and mean squared error become meaningles. So we use the blanks to indicate such problem. However, we can see that the posterior median seems to be a better point estimate of the population mean. The mean bias and the mean squared error are generally smaller, especially when σ is large.

Download:

Fig 3. The confidence intervals of the population mean when μ = −0.25, 0, 0.25, 0.5, 1, σ = 1 and n = 20, 30, 50, 100.

https://doi.org/10.1371/journal.pone.0298307.g003

Download:

Table 1. Mean bias, mean squared error of the estimators of the population mean when σ = 1.

https://doi.org/10.1371/journal.pone.0298307.t001

3.3 Simulation III

In this simulation we consider the case when δ = 0.5, which happens when μ = 0. We fix μ to 0 while σ = 0.5, 1 and 2. The sample sizes range from 20 to 100. We show in Table 2 the asymptotic 95% confidence intervals of δ. The estimate of δ is compared with that of the generalized fiducial distribution proposed by Hannig, which is a Beta distribution. It can be seen that the average length is smaller, which means that the estimate becomes more accurate. To illustrate this idea, we also test the hypothesis of δ = 0.1 to 0.9 for the case δ = Φ(−μ/σ) and calculate the p-value under the null hypothesis. In fact, we can consider any function of μ and σ after drawing pairs of parameters from the posterior distribution. The null hypothesis is set to σ = 0.1, 0.3, 0.5, 0.7 and 0.9. We calculate the p-value for both associated delta-lognormal and compare the result with the traditional one, which use the Beta distribution Beta(n₀ + .5, n₁ + .5) as the generalized fiducial distribution for δ. For each given set we generate 10000 samples and accept N = 10000 pairs of parameters. We calculate p_i = Φ(−μ_i/σ_i) for i = 1 to N and calculate the p-value for δ = p₀, which is

Download:

Table 2. Coverage probability, lower limit and upper limit for δ = 0.5.

https://doi.org/10.1371/journal.pone.0298307.t002

The result is shown in Table 3. A and D in the column named method represent the associated delta-lognormal distribution and the traditional one, respectively. We can see that the p-value of the same null hypothesis for associated delta-lognormal is more centralized than the traditional delta-lognormal distribution. This means that we are more likely to reject the null hypothesis of the associated delta-lognormal than the traditional ones when the null hypothesis is false.

Download:

Table 3. p-value of null hypothesis δ = p₀.

https://doi.org/10.1371/journal.pone.0298307.t003

4 A real data example

In this section, we use the data set of diagnostic test charges in [27]’s study, see Table 4. This data set is analysed by [7], who showed that the postive part fit a lognormal distribution. The data set is further studied by [9, 14]. This data set contains 40 patients, but 10 of them had no diagnostic tests during the study period.

Download:

Table 4. Data set of the diagnostic test charges.

https://doi.org/10.1371/journal.pone.0298307.t004

We assume that the data set comes from an associated delta-lognormal population, where It can be calculated that . We assume that the data are drawn from the associated delta-lognormal distribution below, where

To test the goodness-of-fit, we choose k = 4 and create the partition, where a₁, a₂, a₃, a₄ are 250, 500, 900 and 3000, respectively. Given the level α = 0.05, the test statistic (7) is 3.916, which is smaller than . Thus the assumption of the model is accepted.

We give the confidence interval of the population mean using the method we proposed in this paper. We accept N = 10000 pairs of (μ, σ) and calculate the 2.5% and 97.5% quantiles. As we have mentioned in last section, the 2% and 97% quantiles are also considered since the distribution of σ is asymmetric. The result is compared with the Fiducial method proposed by [14] and the “MOVER” proposed by [16], see Table 5. It can be seen that the confidence interval is largely improved.

Download:

Table 5. The confidence interval of the population mean using different methods.

https://doi.org/10.1371/journal.pone.0298307.t005

5 Results and discussion

In this paper, we consider the associated delta-lognormal distribution in which δ is associated to the location and scale parameters of the lognormal distribution. To combine the information in lognormal distribution with the discrete binomial distribution, we propose the updated fiducial distribution. We established the result that the confidence interval has asymtotically correct level while the significance level of the hypothesis testing is also asymtotically correct. To obtain the confidence intervals and the p-values, we suggest to use a rejection sampling motivated by approximate Bayesian computation to sample from the distributions. The “prior distribution” for μ and σ is chosen to be the fiducial distribution. The binomial likelihood function can be regarded as an updating to the fiducial distribution. We further infer on the functions of the parameters. We use a special case which is δ = Φ(−μ/σ) to illustrate our idea. We give the confidence interval of μ and σ for different sample sizes, and propose the method of testing the hypothesis for functions of μ and σ. For the cases when there are continuous and discrete data, we suggest first to obtain information from the continuous data. Such information are synthesized as a distribution, such as the fiducial distribution. The distribution is further updated by the discrete data through Bayes theorem. For further study, motivated by the research on delta-lognormal, see for example [14, 17, 28], difference or ratio between the parameters of two associated delta-lognormal distribution can be of interest as well as the quantile of the distribution.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0298307.s001

(ZIP)

References

1. Aitchison J. On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 1955, 50, 901–908.
- View Article
- Google Scholar
2. Owen W.J.; Derouen T.A. Estimation of the Mean for Lognormal Data Containing Zeroes and Left-Censored Values, with Applications to the Measure- ment of Worker Exposure to Air Contaminants. Biometrics 1980, 36, 707.
- View Article
- Google Scholar
3. huei Lo N.C.; Jacobson L.D.; Squire J.L. Indices of Relative Abundance from Fish Spotter Data based on Delta-Lognornial Models. Canadian Journal of Fisheries and Aquatic Sciences 1992, 49, 2515–2526.
- View Article
- Google Scholar
4. Pennington M. On Testing the Robustness of Lognormal-based estimators. Biometrics 1991, 47, 1623–1624.
- View Article
- Google Scholar
5. Smith S.J. Evaluating the efficiency of the δ-distribution mean estimator. Biometrics 1988, 44, 485–493.
- View Article
- Google Scholar
6. Xiao-Hua Z.; Tu W. Comparison of Several Independent Population Means When Their Samples Contain Log-Normal and Possibly Zero Observations. Biometrics 1999, 55, 645–651.
- View Article
- Google Scholar
7. Zhou X.H.; Tu W. Confidence Intervals for the Mean of Diagnostic Test Charge Data Containing Zeros. Biometrics 2000, 56, 1118–1125. pmid:11129469
- View Article
- PubMed/NCBI
- Google Scholar
8. Fletcher D. Confidence intervals for the mean of the delta-lognormal distribution. Environmental and Ecological Statistics 2007, 15, 175–189.
- View Article
- Google Scholar
9. Tian L. Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Statistics in Medicine 2005, 24, 3223–3232. pmid:16189811
- View Article
- PubMed/NCBI
- Google Scholar
10. Tian L.; Wu J. Confidence Intervals for the Mean of Lognormal Data with Excess Zeros. Biometrical Journal 2006, 48, 149–156. pmid:16544820
- View Article
- PubMed/NCBI
- Google Scholar
11. Tsui K.W.; Weerahandi S. Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J. Amer. Statist. Assoc. 1989, 84, 602–607.
- View Article
- Google Scholar
12. Li X.; Zhou X.; Tian L. Interval estimation for the mean of lognormal data with excess zeros. Statistics and Probability Letters 2013, 83, 2447–2453.
- View Article
- Google Scholar
13. Wu W.H.; Hsieh H.N. Generalized confidence interval estimation for the mean of delta-lognormal distribution: an application to New Zealand trawl survey data. Journal of Applied Statistics 2014, 41, 1471–1485.
- View Article
- Google Scholar
14. Hasan M.S.; Krishnamoorthy K. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. Journal of Statistical Computation and Simulation 2018, 88, 1499–1514.
- View Article
- Google Scholar
15. Hannig J. On Generalized Fiducial Inference. Statistica Sinica 2009, 19, 491–544.
- View Article
- Google Scholar
16. Zou G.Y.; Taleban J.; Huo C.Y. Confidence interval estimation for lognormal data with application to health economics. Computational Statistics and Data Analysis 2009, 53, 3755–3764.
- View Article
- Google Scholar
17. Harvey J.; van der Merwe A. Bayesian confidence intervals for means and variances of lognormal and bivariate lognormal distributions. Journal of Statistical Planning and Inference 2012, 142, 1294–1309.
- View Article
- Google Scholar
18. Singh K.; Xie M.; Strawderman W.E. Combining information from independent sources through confidence distributions. The Annals of Statistics 2005, 33.
- View Article
- Google Scholar
19. Marjoram P.; Molitor J.; Plagnol V.; Tavaré S. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2003, 100, 15324–15328. pmid:14663152
- View Article
- PubMed/NCBI
- Google Scholar
20. Sisson S.A.; Fan Y.; Tanaka M.M. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2007, 104, 1760–1765. pmid:17264216
- View Article
- PubMed/NCBI
- Google Scholar
21. Del Moral P.; Doucet A.; Jasra A. Sequential Monte Carlo Samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology 2006, 68, 411–436.
- View Article
- Google Scholar
22. Dawid A.P.; Stone M. The functional-model basis of fiducial inference. The Annals of Statistics 1982, 10, 1054–1074.
- View Article
- Google Scholar
23. Liu L.; Shih Y.C.T.; Strawderman R.L.; Zhang D.; Johnson B.A.; Chai H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Statistical Science 2019, 34.
- View Article
- Google Scholar
24. Vaart A.W.v.d. Asymptotic Statistics; Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 1998.
25. Pritchard J.K.; Seielstad M.T.; Perez-Lezaun A.; Feldman M.W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution 1999, 16, 1791–1798. pmid:10605120
- View Article
- PubMed/NCBI
- Google Scholar
26. Tavaré S.; Balding D.J.; Griffiths R.C.; Donnelly P. Inferring Coalescence Times From DNA Sequence Data. Genetics 1997, 145, 505–518. pmid:9071603
- View Article
- PubMed/NCBI
- Google Scholar
27. Callahan C.M. Association of Symptoms of Depression with Diagnostic Test Charges among Older Adults. Annals of Internal Medicine 1997, 126, 426. pmid:9072927
- View Article
- PubMed/NCBI
- Google Scholar
28. Maneerat P.; Niwitpong S.A.; Niwitpong S. Bayesian confidence intervals for a single mean and the difference between two means of delta-lognormal distributions. Comm. Statist. Simulation Comput. 2021, 50, 2906–2934.
- View Article
- Google Scholar

[ref1] 1. Aitchison J. On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 1955, 50, 901–908.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Owen W.J.; Derouen T.A. Estimation of the Mean for Lognormal Data Containing Zeroes and Left-Censored Values, with Applications to the Measure- ment of Worker Exposure to Air Contaminants. Biometrics 1980, 36, 707.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. huei Lo N.C.; Jacobson L.D.; Squire J.L. Indices of Relative Abundance from Fish Spotter Data based on Delta-Lognornial Models. Canadian Journal of Fisheries and Aquatic Sciences 1992, 49, 2515–2526.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Pennington M. On Testing the Robustness of Lognormal-based estimators. Biometrics 1991, 47, 1623–1624.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Smith S.J. Evaluating the efficiency of the δ-distribution mean estimator. Biometrics 1988, 44, 485–493.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Xiao-Hua Z.; Tu W. Comparison of Several Independent Population Means When Their Samples Contain Log-Normal and Possibly Zero Observations. Biometrics 1999, 55, 645–651.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Zhou X.H.; Tu W. Confidence Intervals for the Mean of Diagnostic Test Charge Data Containing Zeros. Biometrics 2000, 56, 1118–1125. pmid:11129469
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Fletcher D. Confidence intervals for the mean of the delta-lognormal distribution. Environmental and Ecological Statistics 2007, 15, 175–189.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Tian L. Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Statistics in Medicine 2005, 24, 3223–3232. pmid:16189811
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Tian L.; Wu J. Confidence Intervals for the Mean of Lognormal Data with Excess Zeros. Biometrical Journal 2006, 48, 149–156. pmid:16544820
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Tsui K.W.; Weerahandi S. Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J. Amer. Statist. Assoc. 1989, 84, 602–607.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref12] 12. Li X.; Zhou X.; Tian L. Interval estimation for the mean of lognormal data with excess zeros. Statistics and Probability Letters 2013, 83, 2447–2453.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Wu W.H.; Hsieh H.N. Generalized confidence interval estimation for the mean of delta-lognormal distribution: an application to New Zealand trawl survey data. Journal of Applied Statistics 2014, 41, 1471–1485.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref14] 14. Hasan M.S.; Krishnamoorthy K. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. Journal of Statistical Computation and Simulation 2018, 88, 1499–1514.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Hannig J. On Generalized Fiducial Inference. Statistica Sinica 2009, 19, 491–544.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Zou G.Y.; Taleban J.; Huo C.Y. Confidence interval estimation for lognormal data with application to health economics. Computational Statistics and Data Analysis 2009, 53, 3755–3764.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Harvey J.; van der Merwe A. Bayesian confidence intervals for means and variances of lognormal and bivariate lognormal distributions. Journal of Statistical Planning and Inference 2012, 142, 1294–1309.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Singh K.; Xie M.; Strawderman W.E. Combining information from independent sources through confidence distributions. The Annals of Statistics 2005, 33.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref19] 19. Marjoram P.; Molitor J.; Plagnol V.; Tavaré S. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2003, 100, 15324–15328. pmid:14663152
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref20] 20. Sisson S.A.; Fan Y.; Tanaka M.M. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2007, 104, 1760–1765. pmid:17264216
View Article
PubMed/NCBI
Google Scholar

[63] View Article

[64] PubMed/NCBI

[65] Google Scholar

[ref21] 21. Del Moral P.; Doucet A.; Jasra A. Sequential Monte Carlo Samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology 2006, 68, 411–436.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref22] 22. Dawid A.P.; Stone M. The functional-model basis of fiducial inference. The Annals of Statistics 1982, 10, 1054–1074.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref23] 23. Liu L.; Shih Y.C.T.; Strawderman R.L.; Zhang D.; Johnson B.A.; Chai H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Statistical Science 2019, 34.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref24] 24. Vaart A.W.v.d. Asymptotic Statistics; Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 1998.

[ref25] 25. Pritchard J.K.; Seielstad M.T.; Perez-Lezaun A.; Feldman M.W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution 1999, 16, 1791–1798. pmid:10605120
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref26] 26. Tavaré S.; Balding D.J.; Griffiths R.C.; Donnelly P. Inferring Coalescence Times From DNA Sequence Data. Genetics 1997, 145, 505–518. pmid:9071603
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref27] 27. Callahan C.M. Association of Symptoms of Depression with Diagnostic Test Charges among Older Adults. Annals of Internal Medicine 1997, 126, 426. pmid:9072927
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref28] 28. Maneerat P.; Niwitpong S.A.; Niwitpong S. Bayesian confidence intervals for a single mean and the difference between two means of delta-lognormal distributions. Comm. Statist. Simulation Comput. 2021, 50, 2906–2934.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

Figures

Abstract

1 Introduction

2 Methodology: Associated delta-lognormal distribution

2.1 Updated fiducial distribution

2.2 Goodness-of-fit test

2.3 Inference on functions of parameters

Confidence interval.

Hypothesis testing.

2.4 Sampling from the updated fiducial distribution

Confidence interval.

Hypothesis testing.

3 Simulation study

3.1 Simulation study I

3.2 Simulation study II

3.3 Simulation III

4 A real data example

5 Results and discussion

Supporting information

S1 File.

References