Bounded distributions place limits on skewness and larger moments

Distributions of strictly positive numbers are common and can be characterized by standard statistical measures such as mean, standard deviation, and skewness. We demonstrate that for these distributions the skewness D3 is bounded from below by a function of the coefficient of variation (CoV) δ as D3 > δ − 1/δ. The results are extended to any distribution that is bounded with minimum value xmin and/or bounded with maximum value xmax. We build on the results to provide bounds for kurtosis D4, and conjecture analogous bounds exists for higher statistical moments.


Introduction
One often considers a probability function P (x) of a random variable X. Distributions of P (x) are characterized by quantities such as mean, median, standard deviation, and skewness.For a continuous random variable, X, P (x) is the probability density of finding a value x in the range (x, x + dx).For a discrete random variable, P (x) is a discrete probability distribution which assigns a probability p i to each potential value x i .The skewness is a measure of the asymmetry of a distribution [1].While there are several possible definitions of skewness [4], a common definition depends on the third moment of the distribution compared to the second moment [5,6].In particular, one can define the nth central moment for continuous or N discreet variables as where µ is the mean of the distribution and √ m 2 = σ is the standard deviation.The standardized moments D n are defined as: Decreasing the skewness requires the small circles becoming even smaller (compared to the mean size), as well as decreasing their frequency of occurrence.The decreasing size reaches its natural limit when the small particles have zero size at S = −2.1 for δ = 0.4, as predicted in Eq (4).
Alternate derivations of this result are also in the literature [8,9].This applies for all distributions.Distributions of strictly positive numbers are often relevant: numbers of objects, sizes of objects such as Fig 1, ages of people, prices, barometer measurements, etc.Such distributions have only non-negative support; one can more broadly consider distributions with bounded support, with boundaries x min and/or x max , generically x bound .Smo lalski [10] worked out upper and lower bounds on the skewness that applies for distributions with bounded support: with δ min = σ/(µ − x min ) to determine D 3min and δ max = σ/(x max − µ) to determine D 3max .
In this paper, we present an alternative derivation for these skewness bounds.Smo lalski's derivation relies on the argument that achieving the extrema of skewness requires a bidisperse distribution.We mathematically prove that this is indeed the case in Section 2. Smo lalski then uses Lagrange multipliers to derive Eq (4); here, we use calculus to derive this equation and extend it to all real bounds.Our method also applies to higher order standardized moments, for which we find similar bounds in Section 3. We state the bounds, show when their behavior can be used to find the maximum or minimum standardized moment, D nextr , and conjecture that these extrema apply to all distributions, not just bidisperse.
We are treating just the value of skewness corresponding to the parent distribution, rather than the sample skewness based on a finite number of samples which has different limits, see [11].Note also that there are other definitions of skewness, for example that use the median of the distribution as part of the calculation [1], for which other limits exist [12][13][14].

Results for Skewness
We begin in the lowest order nontrivial case n = 3, replicating Smo lalski's skewness results.A distribution function with a low value of skewness has small values which rarely occur, for example the smallest circles seen in Fig 1a .A distribution with a January 17, 2024 2/14 high value of skewness is the opposite situation, where the large values rarely occur, for example the largest circles seen in Fig 1d .For a distribution P (x) with only non-negative support, the largest possible values of x are unbounded, but the lowest possible values are bounded by zero.Thus, it makes intuitive sense that the skewness will have a minimum possible value.Our derivation will proceed by first considering bidisperse distributions with nonnegative support and showing that for a fixed δ, the distribution with one value equal to zero achieves the lowest possible skewness.We then show taking two distributions obeying Eq (4) and considering a weighted sum will result in a new distribution that also obeys Eq (4).Next, we argue that any continuous distribution can be approximated by an appropriately weighted sum of bidisperse distributions.In Sec.2.4 we will conclude by generalizing from distributions with non-negative support to distributions with arbitrary bounds, including those with µ ≤ 0.

Skewness for bidisperse distributions
We start by considering a bidisperse distribution, P (x) which takes on values a + , a − with probabilities q, p = 1 − q.Following [15], we define the ratio and focus on q as another important variable describing the distribution.The meaning of the subscripts in a + and a − is the former is the value larger than the mean µ and the latter is smaller than µ, respectively.Knowing the mean µ allows us to relate these quantities as Note that a bidisperse distribution with a given (η, q) is equivalent to a distribution with (1/η, 1 − q) with swapped a + and a − .A key concept which we will use for much of this derivation is that in addition to the mean µ, in general knowing any other two quantities related to the distribution will uniquely determine the distribution.Those two quantities could be the values a + and a − ; they could be η and q as per Eq. 6. Usefully, they can also be the standard deviation and skewness.Thus, we will show that a distribution with a − achieving the minimum possible value (a − = 0) is one where the skewness D 3 achieves its minimum value.
Given a bidisperse distribution defined as above, the standard deviation √ m 2 = σ and skewness D 3 are then expressed as .
While P (x) could be a distribution of a quantity with dimensions (such as a probability distribution of weights), our goal is to understand the non-dimensional skewness D 3 .Thus, rather than considering σ which has dimensions of x, we will use the non-dimensional quantity "coefficient of variation," (CoV) defined as: Here we use the symbol δ and later in this manuscript we will generalize this symbol beyond the specific meaning of CoV.We can use Eqs (6) to eliminate a + and a − from m 2 and D n , resulting in These require η > 1. Eqs. (9,10) can be inverted to provide expressions for q and η in terms of δ and D 3 .We include the substitution M 3 = 4 + D 2 3 which will be a reoccurring term: These two equations give rise to two branches of solutions depending on whether the + or − is taken in each equation.Inspection shows that the negative sign in Eqs (11,12) arrives back at the classical definition of skewness, whereas the positive branch has no significance.For the remainder of our consideration of D 3 , we will use the negative branch of the solutions and drop the ± symbol.We continue and calculate the two possible values according to Eq (6): Using Eq (14), we can do a straightforward calculation for the minimum possible skewness D 3 (δ) for bidisperse distributions with a + , a − ≥ 0. A distribution with a low skewness is one that has a small amount of small numbers: and the smallest number we can get for a distribution of strictly non-negative numbers is zero.Thus, to find the limit on skewness, we solve Eq (14) for a − = 0.This also implies a + = µ/(1 − q).Solving for D 3 when a − = 0 in Eq (14) lets us solve for D 3min : For example, this gives values D 3min = −2.1 for δ = 0.4, and D 3min = 0 for δ = 1.

A bidisperse distribution with
For a fixed value of δ, if the minimum value of the distribution a − is larger than zero, then D 3 will increase.This is not straightforward to see from the equations above, but an alternate formulation will work.Define: Using Eqs.(6) we can factor out µ and arrive at normalized definitions of ∆ +,− = ∆ ′ +,− /µ We then have the probability of a + being January 17, 2024 4/14 We can then get δ using Given that we wish to keep δ constant, we can thus use ∆ + = δ 2 /∆ − to eliminate ∆ + , leading to Now consider the third moment of the distribution m 3 : The partial derivative of m 3 with respect to ∆ − holding δ constant is Increasing ∆ − always decreases m 3 , assuming we keep δ constant and µ positive.Likewise, decreasing ∆ − (making a − larger than zero) will always increase m 3 .Thus, making a − larger than zero must increase the skewness D 3 .This proves that for the bidisperse distribution with a fixed δ, Eq (15) is indeed the lowest possible skewness.

Generalizations of skewness results
Suppose we have two separate distributions P r (x) and P s (x) both with mean µ and both satisfying the bound of Eq (15).We wish to show that any combination of these two distributions, P t (x) = αP r (x) + (1 − α)P s (x) (with 0 ≤ α ≤ 1), also satisfies Eq (15).Given that the means are identical, it is straightforward that 2 , we can rewrite the bound on skewness Eq (15) as Given that both P r and P s satisfy this constraint, we have and thus where the last line uses the expression for δ 2 t introduced above.Next, note that On the right-hand side of Eq (24), add µ 3 δ 4 t and subtract the right-hand side of Eq (25): Every term without δ t on the right-hand side can be combined as proving that the combined distribution function P t (x) must satisfy Eq (4) if the two original distributions satisfy that bound.Finally, we need to generalize from the bidisperse distribution to any distribution.Following [8], we observe that any continuous distribution with some fixed µ = µ 0 can be approximated by a discrete distribution with values a i and probabilities p i and µ = µ 0 .Rohatgi and Székely then proved that any such discrete distribution can be decomposed into a sum of discrete distributions with two values and µ = µ 0 , that is, the bidisperse distributions that we have been considering (see also Appendix A).In the previous paragraph, we have shown that sums of distributions satisfy the bound.Thus, we have proven that Eq (4) holds for any distribution P (x) of strictly non-negative values of x.

Distributions bounded by x min or x max
We have considered distributions P (x) for which x ≥ 0. By rescaling the distribution, we can enforce any value of µ we would like.However, this comes at the expense of potentially running into our bounds.For example, you cannot have some µ ≤ 0 without a minimum less than or equal to zero.When some values of x are below 0, we cannot simply rescale by a constant multiple to enforce the bounds.Of course, an additive constant would fix a distribution and make it non-negative.As noted in the introduction, this also leaves D 3 unchanged: consider P (x) and P ′ (x) = P (x − d).µ ′ = µ + d but as the moments are defined as (x − µ) n , m 2 and m 3 are unchanged by this shift, and thus D 3 does not change.
Similarly, we also note that lim µ→0 + (a + , a − , η) = lim µ→0 − (a + , a − , η).This limit can be calculated directly by multiplying by µ µ in Eq (12) and distributing the µ factor in Eqs (13,14), leaving us with just √ m 2 where there was previously δ.Therefore, we do not have to be concerned with means approaching zero.Now consider the general case of a distribution P (x) bounded by x min from below and with a mean µ which might be zero.Let us assume P (x) has a nontrivial domain, which is to say, it is not a distribution which is only nonzero at one value (which would thus be σ = 0, D 3 = 0).The transformed distribution P ′ (x) = P (x + x min ) has mean µ ′ = µ − x min .This transformed distribution now is nonzero only for x ≥ 0, so is one of the distributions we considered above, and since the distribution has a nontrivial domain, µ ′ > 0 must be true.Therefore, we have: That is, δ depends on the standard deviation σ and mean µ of the original distribution P (x), with the additional correction of subtracting x min , at which point we can use Eq (15) to find D 3min .The other interesting case is a distribution bounded by x max from above.Considering P ′′ (x) = P (−x) changes the mean to be µ ′′ = −µ and the skewness to be January 17, 2024 6/14 , but does not change the standard deviation.The distribution P ′′ (x) is now bounded from below by −x max so we get: which goes into Eq (15) to calculate D 3min .In this case, we actually have found D 3max = −D 3min .Thus, we have rederived the results of [10], that is, Eq (4).If a distribution P (x) has domain x min ≤ x ≤ x max then the above results give both a lower and an upper bound on D 3 .As a conceptual example, suppose that x min = µ − 3σ and x max = µ + 3σ; then −8/3 ≤ D 3 ≤ 8/3.This is consistent with the empirical observation that the skewness tends to lie between -3 and +3.
As a useful check on these results, consider the bidisperse distribution again with probability P (a + ) and P (a − ) for sizes a − < a + .Here we have x min = a − , and CoV given Eq (9).Using Eqs ( 6), (28), and (4), one can solve for D 3min in terms of the variables η and q, recovering Eq (10): that is, D 3min is achieved in this situation.Similarly, using x max = a + one finds again D 3max = D 3 .
If we extend Eq (15) to any arbitrary upper or lower bound x bound , we get the following relationship for the extreme value of D 3 , D 3extr which has reprised Eq (4).
3 Extensions to higher order moments

Notes to Generalize from Skewness
Going forward, we note that Eq (30) is useful for more than the extreme D 3 of the system, when considering a bidisperse system.As noted at the start of Sec. 2, if one is given µ and two other quantities, then one can uniquely determine a bidisperse distribution.Thus knowing one size x bound , δ, and µ, determines the other size and relative probabilities.By plugging in any generic size a/µ, which could be a + /µ or a − /µ to Eq (30), this produces the D 3 that makes a bidisperse distribution with that size and a given CoV δ.This equation can be solved for a to give either of Eqs.(13,14).In other words, if we know we have a bidisperse distribution, then Eq (30) is a formula for D 3 as a function of one of the sizes a.We will derive similar results for higher moments.

Kurtosis D 4
As noted in the introduction, previous results by Pearson [7] show that D 4 ≥ D 2 3 + 1 for any given distribution as per Eq (3).If we now know an inequality for D 3 on any distribution with Eq (30), we can solve for a new limit in D 4 .in terms of x min , µ, and δ.In particular, we have to consider two cases.Treating the situation where the distribution has only nonnegative support (x min = 0), then for δ < 1, D 3,min < 0. This implies that D 3 = 0 is also possible, and therefore we can achieve lower D 4 than is predicted by Eq (3) based on D 3,min .In other words, we can consider the bidisperse distribution with D 3 = 0, which can be found using Eqs (11,12), to achieve D 4,min = 1 as per Eq (3).For δ ≥ 1, D 3,min ≥ 0 and the limit on D 4 then follows from Eq (30).Thus we have for the limits on D 4 in the two cases.
For the more general case of a distribution bounded on one side (by either x min from below, or x max from above, but not both), we can define the limits on kurtosis D 4 in terms of the extremum bounding value x extr .Define That is, δ 0 is the equivalent of Eqs (28,29).We then get In other words, whether the distribution is bounded from below or bounded from above, in both cases this sets a minimum on D 4 -but not a maximum.When the distribution is bounded from below by x min and bounded from above by x max , the situation complicates further.We start by defining δ min and δ max analogously to Eq (32).While x min < x max , the ordering of δ min and δ max is not determined.Thus define where m = 1, 2. Next define δ ′ using which can be solved to get δ ′ = √ δ 1 δ 2 = √ δ min δ max .The limits on kurtosis D 4 are then and values δ > δ ′ are disallowed as they would require the bidisperse distribution be composed of values that lie outside of one or both of the boundaries (x min , x max ).At δ = δ ′ , the only bidisperse distribution that is valid is composed of the two values (x min , x max ) with appropriate probabilities necessary to get the value of δ, and we have These results are visualized in Fig 2a, which illustrates a specific example with x min = 0, x max = 5, and µ = 1.For this example, δ 1 = 1.0 and δ ′ = 3.25.The solid lines indicate Inequalities 39, and the symbols indicate simulated random distributions with a specified δ.Specifically, we generated distributions with data lying between limits x min , x max , and with enforced mean µ, and calculated δ and D 4 for all.For a given small range of δ, we generated 20,000 distinct random distributions, half that are bidisperse, and the other half with three or four values.Over these 20,000 distributions, Fig 2a plots the maximum and minimum D 4 found for each δ, all of which lie between the limits corresponding we have found (shown by the lines).While we have not proven that the bidisperse distribution sets the limits for D 4 for all other distributions, this is suggestive that Inequalities 39 are indeed limits for the kurtosis for any distribution.Simulations with bidisperse distributions, and tri-or quad-disperse distributions, yield extrema which are plotted against the prediction (black line) given by Eq. ( 43) for D 4 (left) and D 5 (right).The data correspond to distributions with values limited to be greater than zero and less than 5µ.The bidisperse triangles are green (pointing up) for the minima and blue (pointing down) for the maxima, and the tri or quad-disperse are pink (diamonds) for minima and purple (squares) for maxima.The more extreme values from quad or tri-disperse was plotted for each polydispersity bin.

Higher order generalized moments
We now proceed with an alternate derivation of Inequalities 34 which we can extend to higher moments.The generic definition of D n in the bidisperse case is: If we use Eqs (6) to solve for the generic definition of D n in terms of q and η, we arrive at a formula of only q: Plugging in n = 3 arrives back at Eq (10).For a bidisperse distribution, we can rewrite the second line of Inequality 34 as an equality in terms of a, one of the two bidisperse values.We then note that Eqs (30,34) are both functions of z = δ/(1 − a/µ): (In D 4 , because only even powers of z appear, the absolute value signs in Eq (32) can be dropped, allowing z to have the same meaning for both D 3 and D 4 .)The general pattern appears to be a finite sum of a geometric series.In fact, Appendix B shows January 17, 2024 9/14 how one can start from Eq (41) to derive One can immediately put in a value for a of interest and get a potential limit of D n .For example, for distributions bounded from below by x min we conjecture with z = δ/(1 − x min /µ) as above.As with D 3 , our conjectured D 5,max is a similar equation using z = δ/(x max /µ − 1). Figure 2b shows these two limits as the solid lines for the case x min = 0, x max = 5, and µ = 1, along with the maximum and minimum observed D 5 values from numerically generated random distributions.All the random distributions lie within our conjectured analytic limits, again suggestive that they are the actual limits.
To try to show that these bounds achieve minima for any n, we can try a similar method as section 2.2.If we write out a more generic m n : we then can take its derivative with respect to ∆ − , giving .
Eq (46) is negative for all odd values of n, implying an increase in the smallest size above zero will only increase D n : thus, for odd n, D n is minimized for a bidisperse distribution with the smallest size set to zero.For even n, negative values of (46) are achieved for ∆ − between 0 and δ, but positive for ∆ − > δ.Thus, the minimum m n is achieved at ∆ − = δ.In fact, this recapitulates the result of Eq (31), that D 4,min is not a universal formula but rather depends on δ.Furthermore, if we try to replicate Eqs.(22-27) with m 4 , the statements are untrue even when δ r = δ s .This gives credence that the boundaries of D n for even n are not always given by the choice of x bound .
Lastly, as previously noted, a bidisperse distribution can be completely described by three parameters: most directly by the values a − , a + , and the probability q for one of these values.Our approach has been to instead use µ, δ, and a − to find a constraint on D n .We note that Eq (43) and the definition of z is sufficient to find analogues of Eqs (11)(12)(13)(14): thus, to use D n , µ, and δ to describe a bidisperse distribution.One can start with those three quantities and determine a − , a + , and q: analytically for D 3 as per Eqs (11 -14), and numerically in other cases.This has been useful in the past for finding distributions with desired values of the moments [15].Moreover, by then considering which values of a − and a + lie within bounds, one has a slightly alternate approach to finding bounds on D n .
We have presented an alternative derivation of Eq (4) to that presented in [10]; this equation provides bounds on the skewness D 3 for a bounded distribution with a given CoV δ.Equivalently, if D 3 is given, then this equation provides a bound for δ.Returning to our starting example, if one is considering a distribution of strictly positive numbers, then for a given D 3 , Eq (4) can be solved for the maximum possible δ.
Our results for D 3 naturally imply limits on D 4 (Inequalities 34 using Pearson's formula [7], and Inequalities 39 more generally).Our general methodology is to note that bidisperse distributions are characterized by three parameters, which most naturally are the two values a + and a − as well as the probability q of the value a + ; however, one can fruitfully choose as the three parameters the mean µ, coefficient of variation δ, and a − .Setting a − to the lower bound of all possible distributions with a given µ and δ leads to lower bounds for D 3 and D 4 .Moreover, our methodology extends to higher moments, leading to conjectures for limits on higher standardized moments as discussed in Section 3.3.One possible extension to our work would be to see if there are other relationships between general D n and D m .It would also be interesting to discover a counterexample where a distribution exists that exceeds the limits of D n set by considering bidisperse distributions as in Section 3.3.We note that numerically at least, we have not found such a counterexample for n = 5, as seen in the data of Fig 2 .Our results have implications for a prior computational study of the packing of spheres, and how the density of such packings depend on the CoV and skewness of a particle size distribution [15].In that prior work, the results had a varying range of skewnesses but the authors did not comment on the choice of this range.In fact, the lower bound on skewness studied in that work corresponds to result of Eq (4).This bound implies that a sphere packing composed of a distribution of radii with a given δ and lowest possible skewness is, in fact, equivalent to the packing of a distribution of equal-sized spheres; and the observed density of such packings in [15] obeyed this property, as it must.This is somewhat analogous to the circle packing shown in Fig. 1a, for which the skewness has not yet reached the lower limit; nonetheless the packing is dominated by circles of the larger size.

A Discreet Distribution Decomposition
Rohatgi and Székely derived the result that any discrete distribution with mean µ can be decomposed into a sum of bidisperse distributions, all with mean µ [8].Their derivation is terse, so we rederive the result in this Appendix with a slightly lengthier presentation.
First, consider a discrete distribution P (x) where x can take values a i with probability p i for 1 ≤ i ≤ n, Σ i p i = 1, and with mean Σ i p i a i = µ.Replace a n and a n−1 by which occurs with probability p ′ n−1 = p n−1 + p n .This is now a new distribution with mean µ and one fewer value.This can be repeated until one ends with a final distribution that takes on three discrete values, a 1 , a 2 , and a ′ 3 with probabilities p 1 , p 2 , and p ′ 3 .If we have a tridisperse distribution with three discrete values (a 1 , a 2 , a 3 ), with probabilities (p 1 , p 2 , p 3 ) and mean µ, we can decompose this into the sum of two bidisperse distributions as follows.Without loss of generality, assume a 1 < µ and

Fig 2 .
Fig 2.Simulations with bidisperse distributions, and tri-or quad-disperse distributions, yield extrema which are plotted against the prediction (black line) given by Eq. (43) for D 4 (left) and D 5 (right).The data correspond to distributions with values limited to be greater than zero and less than 5µ.The bidisperse triangles are green (pointing up) for the minima and blue (pointing down) for the maxima, and the tri or quad-disperse are pink (diamonds) for minima and purple (squares) for maxima.The more extreme values from quad or tri-disperse was plotted for each polydispersity bin.

Table of Symbols
r and P s in some proportionality