Figures
Abstract
In this paper we consider a special kind of semicontinous distribution. We try to concern with the situation where the probability of zero observation is associated with the location and scale parameters in lognormal distribution. We first propose a goodness-of-fit test to ensure that the data can be fit by the associated delta-lognormal distribution. Then we define the updated fiducial distributions of the parameters and establish the results that the confidence interval has asymtotically correct level while the significance level of the hypothesis testing is also asymtotically correct. We propose an exact sampling method to sample from the updated fiducial distribution. It can be seen in our simulation study that the inference on the parameters is largely improved. A real data example is also used to illustrate our method.
Citation: Wang Y, Xu X (2024) Updated fiducial distribution of parameters in the associated delta-lognormal population. PLoS ONE 19(6): e0298307. https://doi.org/10.1371/journal.pone.0298307
Editor: Jiangtao Gou, Villanova University, UNITED STATES
Received: August 8, 2023; Accepted: January 17, 2024; Published: June 5, 2024
Copyright: © 2024 Wang, Xu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: One author received the National Natural Science Foundation of China: 11471035 and 11471030, URL: https://www.nsfc.gov.cn The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
In real applications, such as fisheries research and medical cost analysis, the response variables may be skewed, non-negative and have a non-negligible probability of zero outcomes. These variables are also known as following the semicontinous distribution
where G(x) is the cumulative distribution of a postive variable. In existing researches, δ is usually assumed to be independent of G(x). However, we think that the probability δ is associated with G. For example, consider the precipitation distributions of some certain areas, the areas with larger rainfall per year are more likely to have less dry days. Hence, we can assume that δ is associated with G, say δ = G(a), for some a. In this paper, we try to deal with such assumption by a specified distribution, the delta lognormal distribution. This kind of distributions is first discussed and named by [1]. The cumulative distribution function of delta-lognormal distribution is then defined as follow
where
denotes the cumulative distribution function of a lognormal distribution. Since log X follows a normal distribution N(μ, σ), we still refer to μ and σ as the location and scale parameters respectively in the rest of our paper.
[2] applied this distribution to deal with the measurement of worker exposure to air contaminants in United States. The use of delta-lognormal distribution to fisheries research was done by [3–5]. They considered the estimates of the population mean of the delta-lognormal distribution and further studied their robustness. It is easy to calculate the mean of the delta-lognormal distribution as
Much attention is given to the confidence interval of M by various statisticians. [6] proposed to use the likelihood ratio test to get a better control of the Type I error than the former standard ANOVA F-test and Kruskal-Wallis test. A Bootstrap approach is proposed which is proved to be second-order accurate in [7]. [8] considered the case when at least two non-zero observations are observed and modified the profiled loglikelihood function. [9, 10] used the generalized pivotal quantities proposed by [11] to construct a generalized pivot for estimating the mean. In their paper, a Beta distribution is used as the generalized pivot for δ. This thought is further developed by [12–14]. In the papers mentioned above, generalized pivot quantities are proposed for the binomial variable, which is discrete. Meanwhile, the conclusion of [15] on generalized fiducial inference also motivates some new ideas. The recent results are shown in [14], where the authors focus mainly on the improvement of the Beta distribution to approximate the generalized fiducial distribution of δ.
Instead of finding a generalized fiducial distribution, another method is proposed by [16], named “method of variance estimates recovery”(MOVER). This method can be easily applied to many different settings while guarantees the coverage probability of the confidence interval. From the Bayesian perspective, [17] compared the performance of using different prior distributions for both lognormal distribution and delta-lognormal distribution. They further considered the comparison of the means of two lognormal population.
As we can see from the introduction above, the three parameters in delta-lognormal distribution is assumed to be independent. However, in real applications, the probability of zero outcomes may be associated with the location and scale parameters. Consider the case of the spend on children’s clothing in [1], a family in a rich community is more likely to be a spender, while the one in a relative poor community may be a nonspender, since it is easy to be influenced by other families in the same community. It is natural to assume that the probabiliy of the nonspender in a community with large μ and σ may be smaller than that of a community with small μ and σ. Similar cases illustrate that in real applications, δ may depend on the other two parameters. We refer to this special kind of distribution as an associated delta-lognormal distribution. Thus, we can learn information about μ and σ from both the nonzero observations and the number of zero observations.
Assume that δ is a known function of μ and σ. The unknown parameters of associated delta-lognormal distribution thus become (μ, σ). In this paper we will give the fiducial distributions and infer on the parameters. The idea is that we first obtain the fiducial distributions from the nonzero observations and then update them using the number of nonzero observations which follows a binomial distribution whose success rate is δ(μ, σ). The approach of updating is motivated by the Bayes theorem. The fiducial distributions of (μ, σ) from the nonzero observations is regarded as the “prior distribution”, and is combined with the binomial distribution to get the “posterior distribution”, which is referred to as the updated fiducial distribution. We further infer on μ, σ and functions of them by this updated fiducial distribution. The updated fiducial distributions of (μ, σ) are not derived from some statistics which are asymptotically normal. The asymptotically results of fiducial distribution given by [15] are no longer applicable here. Coincidentally, the updated fiducial distribution is the posterior distribution under the prior 1/σ. We show that this updated fiducial distribution enjoys the Bernstein-von Mises theorem. Then we show that the marginal fiducial distributions of the parametric functions are asymptotic confidence distributions defined in [18]. Therefore, the confidence intervals of the parametric functions have asymptotically correct confidence levels. The significance levels of the hypothesis testings are also asymptotically correct. To deal with the computation, we employ the reject-sampling motivated by the approximate Bayesian computation method, see [19–21]. Though there are some more superior sampling methods, our method is still promising benifits from its simplicity and exactness. We show in simulation study that our inference can be largely improved, due to the combination of the continous and discrete data.
The rest of the article is organized as follows. In Section 2, we introduce the associated delta-lognormal distribution and propose the updated fiducial distribution of the parameters. We further present approaches of confidence interval estimation and hypothesis testing of the parameters. Their frequentist properties are also given. We conduct simulations in Section 3 and use a real data example to illustrate our method in Section 4. We give our conclusion in the last section.
2 Methodology: Associated delta-lognormal distribution
In the articles mentioned earlier, three parameters in delta-lognormal distribution are always assumed to be independent. In this section, we consider the case when δ is associated with θ = (μ, σ). We assume that delta is a function of the location and scale parameters, denoted by δ(μ, σ). This means that an observation in the sample generated from the distribution may be 0 with probability δ(μ, σ) and the nonzero observations should follow a lognormal distribution with parameters μ and σ, which is denoted by LN(μ, σ). The cumulative distribution function of the associated delta-lognormal population is
where FLN(x; μ, σ) is the cumulative distribution function of LN(μ, σ).
A sample from this population is denoted by X = (X1, X2, ⋯, Xn). We assume that N0 observations are zero while the rest N1 = n − N0 ones are nonzero. The likelihood function for the number of zero observations N0 can be given as
(1)
where n0 is the observation of N0, n1 = n − n0.
2.1 Updated fiducial distribution
Without loss of generality, we assume that the first N1 observations are nonzero, while the rest are 0, that is, . Given N1 = n1, the nonzero observations
are from LN(μ, σ). Let n1 ≥ 2. A log-transformation is made to the observations, Yi = log Xi for i = 1, 2, ⋯, n1. Then the sample mean and variance
follow a normal and χ2(n1 − 1) distribution respectively, that is,
Let U ∼ N(0, 1) and V ∼ χ2(n1 − 1) be two independent random variables. Then we have
Given
and S2 = s2, then μ and σ can be regarded as the functions of U and V
The joint distribution of (U, V) is
Then the joint distribution of (μ, σ) can be calculated as
(2)
where
.
This means that the fiducial distribution of (μ, σ) is
(3)
If n1 < 2, we take
(4)
where
when n1 = 0 and
when n1 = 1. Then the fiducial density πF(μ, σ|xobs) in (2) is obtained for all n1 ≥ 0.
The fiducial distributions for lognormal distribution is first given by [22]. However, there is no common fiducial distribution for binomial variable. A generalized fiducial quantity is proposed by [15], which is a Beta distribution Beta(n0, n1 + 1). Other improvements made on the parameter of the Beta distribution is further proposed by [12, 14], which are 0.5[Beta(n0, n1 + 1) + Beta(n0 + 1, n1)] and Beta(n0 + 0.5, n1 + 0.5), respectively.
Now we consider the problem from the Bayesian perspective, without the need of using generalized fiducial quantities. In Bayesian inference, the prior beliefs about the model parameters θ, say π(θ), are updated by observing data yobs through the likelihood function of the model. We denote the likelihood function by p(yobs|θ) and use the Bayes’ theorem to get the posterior distribution
(5)
The prior distribution is often specified by choosing some tractable distributions that we believe the parameters should obey. For associated delta-lognormal distribution, the prior distributions of (μ, σ) are naturally chosen to be the fiducial distributions (2), and is further updated by the likelihood function (1). We define the updated fiducial distribution of (μ, σ) as
(6)
where “∝” means “proportion to”.
2.2 Goodness-of-fit test
Let the observation be x1, x2, ⋯, xn. We take δ = G(x0), where x0 is a preset value and G is the cumulative distribution function of the continuous part. In this paper, we consider the case when G is the lognormal distribution, then
In real applications, x0 maybe known. For example, in Tobit model, see [23],
then x0 = ymin. When x0 is unknown, we can obtain x0 with the following method.
Let n0 and n1 be the numbers of zero and nonzero observations, respectively. Without loss of generality, let be the nonzero ones. Then μ and σ are estimated by
Then
Let
, then
Thus, the associated delta can be given by
To test the goodness-of-fit, the classical Kolmogorov-Smirnov test is no longer suitable in the zero-inflated model. We consider using the Pearson’s chi-square test. The following partition is made on the internal [0, ∞), which is 0, (0, a1], (a1, a2], ⋯, (ak, ∞). Let
where a0 = 0. Then p0, p1, ⋯, pk are estimated by
Let mi be the number of samples in the interval (ai, ai+1), where ak+1 = ∞. We can then construct the following test statistic,
(7)
Then
Given the significance level α, the model of associated delta-lognormal distribution is accepted when
2.3 Inference on functions of parameters
Assume that (μ, σ) follows the updated fiducial distribution πUF(μ, σ|xobs). Let G = g(μ, σ) which is a random variable. Then we denote the marginal fiducial distribution of g(μ, σ) by and the cumulative distribution function of
by
.
Confidence interval.
The confidence interval of g(μ, σ) with confidence level 1 − α is given by
(8)
where
, 0 < γ < 1, satisfies
Hypothesis testing.
For the one-sided hypothesis
The p-value is defined as
(9)
For the two-sided hypothesis
The p-value is then given by
(10)
Now we start to investigate the frequenist properties of the confidence interval and the hypothesis testing. First we define the random variable Zi as
where i = 1, 2, ⋯, n. Then (Z1, X1), (Z2, X2), ⋯, (Zn, Xn) are independently identically distributed as f(z, x; μ, σ) given below. The population sample space is then
and the dominating measure
, where
is the counting measure on {0, 1} and LN(0, 1) is the standard log-normal distribution, which has the density as
When x = 0, we define the function above as the limit 0.
The density f(z, x; μ, σ) with respect to ν is
(11)
where
, θ = (μ, σ) ∈ Ω = (−∞, ∞) × (0, ∞).
We first check that f(z, x; μ, σ) is a probability density function. It can be seen that when Z = 1, X = 0, the density is
while when Z = 0, X = x, the density becomes
Then we integrate f(z, x; μ, σ) on with respect to ν
This indicates that f(z, x; μ, σ) is a density function with respect to ν.
Then we show that the family (11) is quadratic mean differentiable, which is defined below.
Definition 1 (Quadratic Mean Differentiable) The family {Pθ, θ ∈ Ω} is quadratic mean differentiable at θ0 if there exists a vector of real-valued functions such that, as θ → θ0,
To verify that a family is quadratic mean differentiable, a lemma below is used in this paper.
Lemma 1 ([24]). For every θ in an open subset of Rk, let pθ be the propbability density. Assume that the map is continuously differentiable for every x. If the elements of the Fisher information matrix Iθ are well defined and continuous in θ, then the density pθ is quadratic mean differentiable.
Hence we can establish the following proposition.
Proposition 2 Assume that 0 < δ(μ, σ) < 1 and δ(μ, σ) is continously differntiable for all −∞ < μ < + ∞ and σ > 0. Then the density f(z, x; μ, σ) is differentiable in quadratic mean.
The proof of this propostion is given in S1 File.
Given the observation (z1, x1), ⋯, (zn, xn), we have the likelihood function as
Notice that when n1 ≥ 2, the updated fiducial distribution has the form
where y = log x. With simple calculation we can get
This means that the updated fiducial distribution can be regarded as a posterior distribution under the prior distribution 1/σ.
When n → ∞,
(12)
Therefore we can apply the famous Bernstein-von Mises Theorem below to the updated fiducial distribution.
Lemma 3 (Bernstein-von Mises Theorem, [24]) Let the experiment (Pθ : θ ∈ Ω) be differntiable in quadratic mean at θ0 with nonsigular Fisher information matrix , and suppose that for every ε > 0 there exists a sequence of test ψn such that
Furthermore, let the prior measure be absolutely continuous in a neighborhood of θ0 with a continuous positive density at θ0. Then the corresponding posterior distributions satisfy
(13)
At the moment we explain notations in (13). The symbol is the posterior density of
while
is a normal distribution with mean
and variance
. The norm ‖f − g‖ means
which is the L1 distance between densities f and g. Thus we can obtain the result below.
Theorem 4 Under the assumptions of Proposition 2, Bernstein-von Mises theorem holds when the posterior distribution is replaced by the updated fiducial distribution πUF(μ, σ|xobs).
The proof of Theorem 4 is given in S1 File.
To explore the frequenist properties of the functions of parameters under updated fiducial distribution, we give the definitions of the confidence distribution and asympototic confidence distirbution, which were proposed by [18].
Definition 2 A function Hn(⋅) = Hn(Xn, ⋅) on is called a confidence distribution for a parameter θ if (i) for each given
, Hn(⋅) is a continuous cumulative distribution function; (ii) at the true parameter value θ = θ0, Hn(⋅, θ0) = Hn(Xn, θ0), as a function of the sample Xn, has the uniform distribution U(0, 1). The function Hn(⋅) is called asymptotic confidence distribution if requirement (ii) above is replaced by (ii)’ : at θ = θ0,
as n → + ∞, and the continuity requirement on Hn(⋅) is dropped.
The notation “” means convergence in distribution.
Given n1 ≥ 2, under the fiducial distribution (2), it is well known that the marginal fiducial distributions of μ and σ are confidence distributions. However, under the updated fiducial distribution (6), the fiducial distributions (2) are updated by the discrete variable N1. Thus the marginal fiducial distributions are no longer confidence distributions. Except for μ or σ, we consider some functions of them. We have the following theorem.
Theorem 5 Let g(μ, σ) = K(aμ + bσ), where K is a strictly monotone increasing function. Then under the assumptions of Propostion 2, the marginal updated fiducial distribution of g is an asymptotic confidence distribution.
The proof of Theorem 5 is given in S1 File.
Apply this theorem to different functions g(μ, σ), we can get the corollary below.
Corollary 6 The marginial updated fiducial distributions of the following functions are all asymptotic confidence distributions:
- (i) g1(μ, σ) = μ;
- (ii) g2(μ, σ) = σ;
- (iii) g3(μ, σ) = exp[μ + Φ−1(γ)σ], the γ quantile of LN(μ, σ);
- (iv) g4(μ, σ) = Φ[(log x0 − μ)/σ], the cumulative distribution fucntion of LN(μ, σ) at x0;
- (v) g5(μ, σ) = (1 − δ(μ, σ)) exp (μ + σ2/2), the population mean, when
(14)
The proof of Corollary 6 is given in S1 File.
An example to (v) in Corollary 6 is δ(μ, σ) = Φ(−μ/σ). We can see that
which satisfies (14). The following proposition guanrantees the level of both the confidence interval and the hypothesis testing.
Proposition 7 If the marginal updated fiducial distribution of g(μ, σ) is an asymptotic confidence distribution. Then the level of the confidence interval is asymptotically 1 − α. The significance level of hypothesis testing is asymptotically α.
The proof of Proposition 7 is given in S1 File.
From Propostion 7, if g(μ, σ) is taken as in Theorem 5 or Corollary 6, the confidence intervals in (8) and the p-values in (9) and (10) are asymptotically correct when n → ∞. When the sample size n is moderate, we give simulations in next section.
2.4 Sampling from the updated fiducial distribution
To give the confidence intervals of the parameters, we need to compute the γ-quantiles of the updated fiducial distributions. Similarly, to give the p-values of the hypothesis testing, we need to compute the cumumlative distribution functions of the marginal updated fiducial distirbutions at g0. However, it is difficult to give the closed forms of them. Fortunately, we can adopt a simple method to produce accurate sample from the updated fiducial distribution, which is known as the reject sampling method.
We can draw parameters from the “prior distribution” and accept the ones that generate the same number of zero as the observed data. This is similar to the reject-ABC method proposed first by [25, 26]. However, it shall be noticed that there is no approximation error in our sampling method for associated delta-lognormal distribution, since we don’t use summary statistics and accept only the parameters which generate the same number of zero. Thus the parameters we accepted are equavilent to sampling from the real posterior distribution (6).
Without loss of generality, assume first that the observation of sample size n is , where xi > 0 for i = 1, ⋯, n1 and the rest n0 = n − n1 ones are zero. A log-transformation is then made to the nonzero observations
. Then the fiducial distributions of μ and σ is given by (3), which are
(15)
where U is the standard normal random variable while V is a χ2(n1 − 1) random variable.
- Log-transformation is made on the nonzero observation, which is denote by
. The sample mean and sample variance are calculated and denoted by
and s2.
- If n1 ≥ 2, sample U from the standard normal distribution and V from the
distribution, respectively. To sample from the fiducial distribution of the parameters, we simply calculate μ and σ2 using (15). If n1 < 2, we draw samples from (4).
- Calculate δ = δ(μ, σ) and draw samples from a binomial distribution B(n, δ(μ, σ)). We accept the parameters if the number of zero equals to n0.
- The process is repeated until we accept a certain number of parameters.
With the sample from the updated fiducial distribution, we then consider the inference on the scalar function g(μ, σ). We first assume that a certain number, say N, parameters are accepted using reject sampling method. We denote these parameters by (μ1, σ1), (μ2, σ2), ⋯, (μi, σi), ⋯, (μN, σN). For the function G = g(μ, σ), let gi = g(μi, σi), i = 1, 2, ⋯, N.
Confidence interval.
The confidence interval (8) of g(μ, σ) can be computed as follow. We sort in ascending order
Then we take
(16)
where [a] is the largest integer not larger than number a.
Hypothesis testing.
The first hypothesis is testing whether (μ, σ) is in a nondegenerate region. This means that the null hypothesis is (μ, σ) ∈ Ω0 where Ω0 ⊂ ℜ × ℜ+. To test this hypothesis, we simply calculate the ratio of (μi, σi) contatining in Ω0 as follow and denote this value as the p-value
where #A means the number of set A.
We also consider testing the null hypothesis H0 : θ = θ0 versus H1 : θ ≠ θ0. The p-value under the null hypothesis is then
where
Thus we reject the null hypothesis when the p-value is not larger than a given level α.
3 Simulation study
In this section we illustrate the performance of our confidence intervals and hypothesis testing when the sample size is moderate. We take δ(μ, σ) = Φ((a − μ)/σ). Without loss of generality, we take a = 0. Otherwise, for nonzero observation Xi, let Yi = Xie−a. Then log Yi ∼ N(μ−a, σ2), which means that we take μ − a as the new location parameter. So we consider δ(μ, σ) = Φ(−μ/σ). We can sample from the updated fiducial distribution using the method we proposed. Three simulation studies are conducted in this section. The first simulation study shows the interval estimates of the parameters in associated delta-lognormal distribution, we compare this with that of the fiducial distributions to illustrate the improvements. In the second simulation study, we compare the estimates of population mean of the associated delta-lognormal and the traditional one when δ = 0.5. In the last simulation study, we focus on the estimation and hypothesis testing for δ in associated delta-lognormal distribution, we also compare the result with that of traditional one.
3.1 Simulation study I
In this simulation study we consider the estimate of μ and σ. The sample sizes considered are n = 20, 30, 50 and 100. We set the value of σ to 0.5, 1 and 2 while the value of μ is changed to make the corresponding δ = Φ(−μ/σ) approximately equal to 0.6, 0.5, 0.4, 0.3 and 0.15. Particularly, when σ = 1, the corresponding values of μ are −0.25, 0, 0.25, 0.5 and 1. For each parameter setting, we generate 1000 repetitions and for each one we sample n = 4000 pairs of (μ, σ) and calculate the 95 percent confidence intervals of μ and σ using the Eq (16). We compare the results with the estimates obtained from the fiducial distribution and the results are shown in the Figs 1 and 2, where the details are given in the Tables in S1 File. The figures are confidence intervals of μ and σ when σ = 1 and n = 20, 30, 50 and 100. The horizontal coordinates are the values of μ = −0.25, 0, 0.25, 0.5 and 1, while the vertical coordinates are the confidence intervals of μ or σ for different μ. The four plots from left to right and from top to bottom denote the cases when n = 20, 30, 50 and 100, respectively. The plot of σ = 1 and 2 are quite similar with that of 0.5, thus we don’t put the figures in our context.
For this specific δ, we can see that the estimate of μ is largely improved. The lower limits of μ become larger compare to the fiducial distribution while the upper limits are getting smaller. This leads to a significantly smaller confidence interval while retain the coverage probability. However, the impact on σ is not apparently as μ. The average length of the confidence intervals for σ generally get smaller than that of fiducial distribution, with the decreasing of δ and sample size n. The lower limits seems to be always bigger than that of the fiducial distribution while the upper limits gradually become smaller as δ and the sample size increase. We also notice that the distribution of σ is asymmetric, so we suggest to use the 2% and 97% quantile of the sampled σ to construct the 95% confidence interval of σ.
3.2 Simulation study II
In this simulation we consider the inference on the log population mean of the associated delta-lognormal distribution which has the form
(17)
The population mean of the delta lognormal distribution plays a crucial role in statistical analysis and inference. It is a measure of central tendency, providing a summary of the central location of the distribution. For example, in the real data of our paper, we estimate the diagnostic test charges of the patients. The true value of the parameters and the sample sizes are set as we did in the last simulation. We first consider the point estimate of the log population mean, the “posterior mean” and the “posterior median” are considered, the former is approximated by
(18)
while the latter is approximated by the 0.5 quantile of the N accepted values. We compute the mean bias and the mean squared error of these two estimates and compare with that of [14]. To obtain the estimate of Krishnamoorthy, we compute the mean of “Qtheta” in his paper. The result of the case when σ = 1 is shown in Table 1, the ones for σ = 0.5 and 2 can be found in the S1 File. “MB”, “MDB” and “GQB” stand for the mean bias of the posterior mean, posterior median and the estimate using the genralized quantity in [14]. “MSE” stands for the mean squared error and the subscripts indicate the three estimate. We also use Fig 3 for a better view of the two estimates. It should be noticed that some extreme cases may occur when δ is large and the sample size is small, as is shown in the first plot of Fig 3 where n = 20. In these extreme cases, there are only three or less nonzero observations, making the estimates far from the true value, thus the mean bias and mean squared error become meaningles. So we use the blanks to indicate such problem. However, we can see that the posterior median seems to be a better point estimate of the population mean. The mean bias and the mean squared error are generally smaller, especially when σ is large.
3.3 Simulation III
In this simulation we consider the case when δ = 0.5, which happens when μ = 0. We fix μ to 0 while σ = 0.5, 1 and 2. The sample sizes range from 20 to 100. We show in Table 2 the asymptotic 95% confidence intervals of δ. The estimate of δ is compared with that of the generalized fiducial distribution proposed by Hannig, which is a Beta distribution. It can be seen that the average length is smaller, which means that the estimate becomes more accurate. To illustrate this idea, we also test the hypothesis of δ = 0.1 to 0.9 for the case δ = Φ(−μ/σ) and calculate the p-value under the null hypothesis. In fact, we can consider any function of μ and σ after drawing pairs of parameters from the posterior distribution. The null hypothesis is set to σ = 0.1, 0.3, 0.5, 0.7 and 0.9. We calculate the p-value for both associated delta-lognormal and compare the result with the traditional one, which use the Beta distribution Beta(n0 + .5, n1 + .5) as the generalized fiducial distribution for δ. For each given set we generate 10000 samples and accept N = 10000 pairs of parameters. We calculate pi = Φ(−μi/σi) for i = 1 to N and calculate the p-value for δ = p0, which is
The result is shown in Table 3. A and D in the column named method represent the associated delta-lognormal distribution and the traditional one, respectively. We can see that the p-value of the same null hypothesis for associated delta-lognormal is more centralized than the traditional delta-lognormal distribution. This means that we are more likely to reject the null hypothesis of the associated delta-lognormal than the traditional ones when the null hypothesis is false.
4 A real data example
In this section, we use the data set of diagnostic test charges in [27]’s study, see Table 4. This data set is analysed by [7], who showed that the postive part fit a lognormal distribution. The data set is further studied by [9, 14]. This data set contains 40 patients, but 10 of them had no diagnostic tests during the study period.
We assume that the data set comes from an associated delta-lognormal population, where
It can be calculated that
. We assume that the data are drawn from the associated delta-lognormal distribution below,
where
To test the goodness-of-fit, we choose k = 4 and create the partition, where a1, a2, a3, a4 are 250, 500, 900 and 3000, respectively. Given the level α = 0.05, the test statistic (7) is 3.916, which is smaller than . Thus the assumption of the model is accepted.
We give the confidence interval of the population mean using the method we proposed in this paper. We accept N = 10000 pairs of (μ, σ) and calculate the 2.5% and 97.5% quantiles. As we have mentioned in last section, the 2% and 97% quantiles are also considered since the distribution of σ is asymmetric. The result is compared with the Fiducial method proposed by [14] and the “MOVER” proposed by [16], see Table 5. It can be seen that the confidence interval is largely improved.
5 Results and discussion
In this paper, we consider the associated delta-lognormal distribution in which δ is associated to the location and scale parameters of the lognormal distribution. To combine the information in lognormal distribution with the discrete binomial distribution, we propose the updated fiducial distribution. We established the result that the confidence interval has asymtotically correct level while the significance level of the hypothesis testing is also asymtotically correct. To obtain the confidence intervals and the p-values, we suggest to use a rejection sampling motivated by approximate Bayesian computation to sample from the distributions. The “prior distribution” for μ and σ is chosen to be the fiducial distribution. The binomial likelihood function can be regarded as an updating to the fiducial distribution. We further infer on the functions of the parameters. We use a special case which is δ = Φ(−μ/σ) to illustrate our idea. We give the confidence interval of μ and σ for different sample sizes, and propose the method of testing the hypothesis for functions of μ and σ. For the cases when there are continuous and discrete data, we suggest first to obtain information from the continuous data. Such information are synthesized as a distribution, such as the fiducial distribution. The distribution is further updated by the discrete data through Bayes theorem. For further study, motivated by the research on delta-lognormal, see for example [14, 17, 28], difference or ratio between the parameters of two associated delta-lognormal distribution can be of interest as well as the quantile of the distribution.
References
- 1. Aitchison J. On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin. Journal of the American Statistical Association 1955, 50, 901–908.
- 2. Owen W.J.; Derouen T.A. Estimation of the Mean for Lognormal Data Containing Zeroes and Left-Censored Values, with Applications to the Measure- ment of Worker Exposure to Air Contaminants. Biometrics 1980, 36, 707.
- 3. huei Lo N.C.; Jacobson L.D.; Squire J.L. Indices of Relative Abundance from Fish Spotter Data based on Delta-Lognornial Models. Canadian Journal of Fisheries and Aquatic Sciences 1992, 49, 2515–2526.
- 4. Pennington M. On Testing the Robustness of Lognormal-based estimators. Biometrics 1991, 47, 1623–1624.
- 5. Smith S.J. Evaluating the efficiency of the δ-distribution mean estimator. Biometrics 1988, 44, 485–493.
- 6. Xiao-Hua Z.; Tu W. Comparison of Several Independent Population Means When Their Samples Contain Log-Normal and Possibly Zero Observations. Biometrics 1999, 55, 645–651.
- 7. Zhou X.H.; Tu W. Confidence Intervals for the Mean of Diagnostic Test Charge Data Containing Zeros. Biometrics 2000, 56, 1118–1125. pmid:11129469
- 8. Fletcher D. Confidence intervals for the mean of the delta-lognormal distribution. Environmental and Ecological Statistics 2007, 15, 175–189.
- 9. Tian L. Inferences on the mean of zero-inflated lognormal data: the generalized variable approach. Statistics in Medicine 2005, 24, 3223–3232. pmid:16189811
- 10. Tian L.; Wu J. Confidence Intervals for the Mean of Lognormal Data with Excess Zeros. Biometrical Journal 2006, 48, 149–156. pmid:16544820
- 11. Tsui K.W.; Weerahandi S. Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J. Amer. Statist. Assoc. 1989, 84, 602–607.
- 12. Li X.; Zhou X.; Tian L. Interval estimation for the mean of lognormal data with excess zeros. Statistics and Probability Letters 2013, 83, 2447–2453.
- 13. Wu W.H.; Hsieh H.N. Generalized confidence interval estimation for the mean of delta-lognormal distribution: an application to New Zealand trawl survey data. Journal of Applied Statistics 2014, 41, 1471–1485.
- 14. Hasan M.S.; Krishnamoorthy K. Confidence intervals for the mean and a percentile based on zero-inflated lognormal data. Journal of Statistical Computation and Simulation 2018, 88, 1499–1514.
- 15. Hannig J. On Generalized Fiducial Inference. Statistica Sinica 2009, 19, 491–544.
- 16. Zou G.Y.; Taleban J.; Huo C.Y. Confidence interval estimation for lognormal data with application to health economics. Computational Statistics and Data Analysis 2009, 53, 3755–3764.
- 17. Harvey J.; van der Merwe A. Bayesian confidence intervals for means and variances of lognormal and bivariate lognormal distributions. Journal of Statistical Planning and Inference 2012, 142, 1294–1309.
- 18. Singh K.; Xie M.; Strawderman W.E. Combining information from independent sources through confidence distributions. The Annals of Statistics 2005, 33.
- 19. Marjoram P.; Molitor J.; Plagnol V.; Tavaré S. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2003, 100, 15324–15328. pmid:14663152
- 20. Sisson S.A.; Fan Y.; Tanaka M.M. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 2007, 104, 1760–1765. pmid:17264216
- 21. Del Moral P.; Doucet A.; Jasra A. Sequential Monte Carlo Samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology 2006, 68, 411–436.
- 22. Dawid A.P.; Stone M. The functional-model basis of fiducial inference. The Annals of Statistics 1982, 10, 1054–1074.
- 23. Liu L.; Shih Y.C.T.; Strawderman R.L.; Zhang D.; Johnson B.A.; Chai H. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Statistical Science 2019, 34.
- 24.
Vaart A.W.v.d. Asymptotic Statistics; Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 1998.
- 25. Pritchard J.K.; Seielstad M.T.; Perez-Lezaun A.; Feldman M.W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution 1999, 16, 1791–1798. pmid:10605120
- 26. Tavaré S.; Balding D.J.; Griffiths R.C.; Donnelly P. Inferring Coalescence Times From DNA Sequence Data. Genetics 1997, 145, 505–518. pmid:9071603
- 27. Callahan C.M. Association of Symptoms of Depression with Diagnostic Test Charges among Older Adults. Annals of Internal Medicine 1997, 126, 426. pmid:9072927
- 28. Maneerat P.; Niwitpong S.A.; Niwitpong S. Bayesian confidence intervals for a single mean and the difference between two means of delta-lognormal distributions. Comm. Statist. Simulation Comput. 2021, 50, 2906–2934.