Confidence intervals for rainfall dispersions using the ratio of two coefficients of variation of lognormal distributions with excess zeros

Rainfall fluctuation is directly affected by the Earth’s climate change. It can be described using the coefficient of variation (CV). Similarly, the ratio of CVs can be used to compare the rainfall variation between two regions. The ratio of CVs has been widely used in statistical inference in a number of applications. Meanwhile, the confidence interval constructed with this statistic is also of interest. In this paper, confidence intervals for the ratio of two independent CVs of lognormal distributions with excess zeros using the fiducial generalized confidence interval (FGCI), Bayesian methods based on the left-invariant Jeffreys, Jeffreys rule, and uniform priors, and the Wald and Fieller log-likelihood methods are proposed. The results of a simulation study reveal that the highest posterior density (HPD) Bayesian using the Jeffreys rule prior method performed the best in terms of the coverage probability and the average length for almost all cases of small sample size and a large sample size together with a large variance and a small proportion of non-zero values. The performance of the statistic is demonstrated on two rainfall datasets from the central and southern regions in Thailand.


Introduction
The Earth's climate is changing due to increased greenhouse gas emission from human activities, and climate change has resulted in dramatic weather events such as heatwaves, heavy rainfall, droughts, etc. Thailand is a country situated in the southeastern region of Asia that is affected by the southwest and northeast monsoons at different times of the year [1]. The country is divided into five regions: North, Northeast, Central, East, and South. Over the past three decades, Thailand has suffered from increased temperatures and fluctuating rainfall. In this study, rainfall is of interest because too much rain causes flooding and too little causes droughts at different times of the year in each area of the country. Rainfall fluctuation can be described using the CV, meaning that the ratio of CVs can be used to compare the rainfall variation between two regions. The lognormal distribution with excess zeros, a mixed distribution of discrete and continuous random variables, has been applied in many studies involving the proposed methods. Next, application of the proposed methods to rainfall datasets from two regions in Thailand is demonstrated. Last, the paper is brought to a close with a discussion and conclusion.

Materials and methods
Let X ij ¼ ðX i1 ; X i2 ; . . . ; X in i Þ, for i = 1, 2, j = 1, 2, . . ., n i , be a semi-continuous random sample that conforms a lognormal distribution with excess zeros with the probability of zero values δ i,0 , mean μ i , and variance s 2 i , denoted by X ij � Dðd i;0 ; m i ; s 2 i Þ. The zero observations have a binomial distribution, while the non-zero observations follow a lognormal distribution. The numbers of zero and non-zero observations are defined as n i,0 and n i,1 , respectively, where n i = n i,0 + n i, 1 . This leads to the distribution function of X ij : where δ i,0 = P(x ij = 0), n i,0 * B(n i , δ i,0 ) [6], and Hðx ij ; m i ; s 2 i Þ is a lognormal cumulative distribution function [11], so ln X ij follows a normal distribution with mean μ i and variance s 2 i for X ij > 0. Thus, the probability density function of X ij can be expressed as such that if x ij = 0, then I 0 [x ij ] = 1 and I (0,1) [x ij ] = 0, and if x ij > 0, then I (0,1) [x ij ] = 1. According to Aitchison [30], the population mean and variance of X ij are m X ij ¼ d i;1 exp ðm i þ s 2 i =2Þ and s 2 X ij ¼ d i;1 exp ð2m i þ s 2 i Þ½ exp ðs 2 i Þ À d i;1 �, respectively, where δ i,1 = 1 − δ i,0 . Thus, the CV of X ij can be defined as ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi exp ðs 2 i Þ À d i; The aim here is to construct the confidence interval for the ratio of the CVs: In accordance with a lognormal distribution with excess zeros, the maximum likelihood esti- The approaches used to construct the confidence intervals are in the following subsections.

Bayesian methods
The probability density function of a lognormal distribution with excess zeros (Eq (2)) has unknown parameters δ i,0 , μ i and s 2 i . The joint likelihood function is defined as Since we are interested in the ratio of the CVs, then the Fisher information matrix of the unknown parameters ðd 1;0 ; m 1 ; s 2 1 ; d 2;0 ; m 2 ; s 2 2 Þ computed by the second-order derivative of the log-likelihood function can be expressed as Subsequently, the equitailed confidence intervals and the HPD intervals are constructed for the left-invariant Jeffreys, the Jeffreys rule, and the uniform priors.
The left-invariant Jeffreys prior. Because the lognormal distribution with excess zeros is a combination of binomial and lognormal distributions, the Jeffreys priors for δ i,0 and s 2 i are computed under these distributions. According to Jeffreys [38] and Ghosh et al. [39], the invariant Jeffreys prior is obtained using a Fisher information matrix (I(θ)), which is given as where Subsequently, the posterior densities of δ i,0 and s 2 i follow a beta distribution, Beta (n i,0 + 1/2, n i,1 + 3/2), and an inverse gamma distribution, IGðn i;1 =2; n i;1 b s 2 i =2Þ, respectively. The uniform prior. For the uniform prior, the prior probability is a constant function whereby all possible values are equally likely to be a priori [41,42]. Accordingly, for binomial and lognormal distributions, the uniform priors of δ i,0 and s 2 i are proportional to 1 [43,44], which implies that the uniform prior for a lognormal distribution with excess zeros is pðd i;0 ; s 2 i Þ / 1. The joint posterior distribution for a lognormal distribution with excess zeros is given by ( Thus, the posterior distribution for δ i,0 follows a beta distribution, Beta (n i,0 + 1, n i,1 + 1), and that of s 2 i is an inverse gamma distribution, s 2 i j x ij � IG½ðn i;1 À 2Þ=2; ðn i;1 À 2Þb s 2 i =2�. The posterior distributions of δ i,0 and s 2 i can be replaced by following Eq (4), and then the equitailed confidence intervals and HPD intervals are constructed by imposing Algorithm 2.

The Wald log-likelihood method
According to Eq (10), the log-likelihood function is ln L / n 1;0 ln d 1;0 þ n 2;0 ln d 2;0 þ n 1;1 ln ð1 À d 1;0 Þ þ n 2;1 ln ð1 À d 2;0 Þ À 1 2 ( From Eq (4), the parameter of interest is ϕ. Subsequently, the log-likelihood function is reparameterized in terms of ϕ by substituting η 1 = ϕη 2 , s 2 ð18Þ , be the ratio of CVs of lognormal distributions with excess zeros. The unrestricted Proof. Following Nam and Kwon [16], since the MLE of , then the log-likelihood function for reparameterization from Eq (17) can be written as : The asymptotic variance of b � is obtained using the Fisher information which is also written as By the second-order partial derivative, the Fisher information elements are and the other elements are zeros. By the left-hand block of the matrix I À 1 n ðyÞ, I 11 , the where η 1 = ϕη 2 and The asymptotically standard normal distribution is Therefore, the 100(1 − α)% two-sided confidence interval for ϕ based on the Wald log-likelihood method is where z 1−α/2 is the (1 − α/2)-th percentile of the standard normal distribution.

The Fieller log-likelihood method
Following Eq (17) and since s 2 i ¼ ln ð1 À d i;0 Þ þ ln ðZ 2 i þ 1Þ, the log-likelihood function can be written as The CV of a lognormal distribution with excess zeros is for i = 1, 2.
Proof. The MLEs of the parameters are obtained from the first-order derivative of Eq (23) Similarly to Theorem 1, the elements of the Fisher information matrix are for m = n = 1, 2, i = 1, 2, I mn for m = n = 3, 4, 5, 6 follows from Theorem 1 when i þ 1Þ and I mn = 0 for m, n = 1, 2, . . ., 6 and m 6 ¼ n. The asymptotic vari- Since δ i,1 = 1 − δ i,0 and the MLEs of δ i,1 and s 2 i are b d i;1 and b s 2 i , respectively, then the esti- According to Fieller [45], the statistic ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi d Therefore, the 100(1 − α)% two-sided confidence interval for ϕ based on the Fieller log-likelihood method is ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where z 1−α/2 is the (1 − α/2)-th percentile of a standard normal distribution. Note that: The variance of the estimator for ratio of CVs from Theorem 1 and the variance of the estimator for CV from Theorem 2 are equal to the variances which are reported by Nam and Kwon [16] when δ 1,1 , δ 2,1 = 1.

Simulation studies
Simulation studies were conducted to compare the performances of the methods used to construct the confidence intervals using FGCI, the Bayesian methods (the left-invariant Jeffreys, the Jeffreys rule, and the uniform priors), and the Wald and Fieller log-likelihood methods. The optimal method was the one with a coverage probability equal to or greater than the nominal confidence level of 0.95 together with the shortest average length. Following Wu and Hsieh [46], cases that were expected to have non-zero values of less than 10 were not considered. Sample sizes (n i ), δ i,1 , and s 2 i were set as reported in Table 1. For all of the simulations, 15,000 runs were generated and 5,000 replicates were defined for the FGCI and Bayesian methods via Monte Carlo simulation using RStudio version 1.1.463. Table 1 and Figs 1-3 present the coverage probabilities and average lengths of the confidence intervals for the various methods. The results show that the coverage probabilities of FGCI were consistently close to the nominal confidence level of 0.95 for all cases. The coverage probabilities of the Bayesian method using the uniform prior (B-U) for both the equitailed confidence interval and HPD interval were greater than or close to the nominal confidence level of 0.95 for all cases. The Bayesian methods using the left-invariant Jeffreys (B-LIJ) and the Jeffreys rule (B-JR) priors based on equitailed confidence intervals and HPD intervals attained coverage probabilities greater than or close to the nominal confidence level of 0.95 in almost every case. However, those attained by the Wald log-likelihood method were less than the nominal confidence level 0.95 for all cases whereas those produced by the Fieller log-likelihood method were greater than or close to the nominal confidence level 0.95 in some cases.
The average lengths of B-JR based on the HPD interval was the shortest for most of the cases when the sample size was small (n 1 and/or n 2 = 25). For large sample sizes (n 1 , n 2 = 50, 100), B-JR based on the HPD interval had mainly narrow average lengths for the cases of δ 1,1 , δ 2,1 = 0.2 for all variances and δ 1,1 , δ 2,1 = 0.5, 0.8 together with s 2 1 , s 2 2 ¼ 1; 2, while those of FGCI were the shortest for cases with small variance(s) (s 2 1 and/or s 2 2 ¼ 0:5).

An empirical example
As previously mentioned, the CV can be used to measure the dispersion in a dataset, especially in cases like rainfall data that conform to a lognormal distribution with excess zeros. Therefore, daily rainfall data from the central and southern regions (Chumphon province) in August 2017, collected by the Central and Southern Region Irrigation Hydrology Center were used to construct confidence intervals for evaluating the proposed methods. The datasets were shown in Tables 2 and 3 Table 4.
The lower and upper bounds from the results indicate that the dispersion of rainfall in the central region was more than the southern region. This is because the southern region has abundant precipitation throughout the year due to being located on the peninsula surrounded by the Andaman Sea and the Gulf of Thailand. The central region is located on the plains that    cause irregular precipitation, thus the dispersion of the rainfall data is larger than in the southern region.

Conclusion
FGCI, Bayesian methods based on the left-invariant Jeffreys, Jeffreys rule, and uniform priors, and the Wald and Fieller log-likelihood methods were used to construct the confidence intervals for the ratio of CVs of lognormal distributions with excess zeros. Coverage probabilities and the average lengths were used to evaluate the performance of the proposed methods.
The simulation results indicate that the coverage probabilities for all cases of the FGCI and Bayesian methods using the uniform prior and almost all cases of the Bayesian method using the left-invariant Jeffreys and Jeffreys rule priors were close to or greater than the target.  However, when considering the average lengths, the Bayesian method using the Jeffreys rule prior based on the HPD interval produced the shortest ones in cases of small sample sizes and a large sample size together with a small expected number of non-zero observations and a large variance, while FGCI was optimal for the other cases. Therefore, the HPD Bayesian method using the Jeffreys rule prior and the FGCI method are suitable for constructing confidence intervals for the ratio of CVs of lognormal distributions with excess zeros. Nam and Kwon [16] introduced the Wald-type and Fieller-type methods for the ratio of CVs of lognormal distributions that were appropriate for medium sample sizes. In the present study, this method was extended for a lognormal distribution with zero-inflated observations. However, the coverage probabilities of the Wald log-likelihood method were less than the target for all cases whereas those of the Fieller log-likelihood method were greater than the target for a few cases when the probability of non-zero values was more than half and for a large variance. Moreover, the average lengths of these methods were wider than the FGCI and Bayesian methods. Hence, the Wald and Fieller log-likelihood methods are not recommended for constructing confidence intervals for the ratio of CVs of lognormal distributions with excess zeros.
Furthermore, the confidence intervals evaluation in the empirical study is coincidental with the simulation results.