Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Numerous inequalities and related communications accompanying discrete divergence models in probability spaces

Abstract

The analysis of inequalities aids in the formulation of optimal coding schemes that improve the rate of information transfer while reducing the probability of errors. This consequently implies that we have a major impact on having solid and secured correspondence systems for applications running from broadcast communications to information pressure. Information theory inequalities constitute a theoretical toolbox for designing channels that can overcome a challenging environment, allowing information to be held and communicated reliably and securely in a world that is becoming ever more interconnected. The paper introduces a general divergence model in general probability spaces and extends known information-theoretic inequalities and results based on variational models. We have built various inequalities for finite sequences of positive real numbers, the specific cases of which are important in information theory, especially in connection with several divergence models that remain in the literature. Additionally, we have derived certain other important communications concerning positive real numbers in relation to some divergence models.

Introduction

The well-known validity of coding theory allows for the examination of code combinations using discrete probabilistic entropic models and supports applications in many domains. Shannon [1] developed the concept of entropy within discrete probability spaces, forming the foundation of information theory. Shannon’s widely accepted view of probabilistic entropy [1] expanded the coding theory literature by introducing several entropic models. This established process laid the groundwork for a discrete entropic model with unexpectedly favourable features.

We have the notion that is the assemblage of all unconnected possibility distributions with nonnegative members and complete support on a set of cardinality n and . A non-degenerate possibility distribution is thought to be a probability distribution. In many cases, transactions must be delivered using discrete probability distributions, each of which contains a positive real value. Consequently, we define the following sets:

For every probability distribution, we give various current entropic models:

The Shannon [1] entropy:

(1.1)

The Renyi [2] entropy:

(1.2)

The Havrda-Charvat [3] entropy:

(1.3)

Huang and Zhang [4] provided a new interpretation of Shannon’s [1] mutual information, noting that it is often difficult to compute for uncountable sample spaces. As a result, the authors observed that the asymptotic modus operandi pedestal on Fisher information often provides spectacular estimates to such information, and they mirrored the estimates pedestal on particular divergence models in the probability spaces. Furthermore, the authors performed numerical repetition and demonstrated that their predicted mode of operation was incredibly wonderful, with increasing ease of application to a wide range of real and hypothetical issues. Furthermore, discrete entropy models have a wide range of applications. Sholehkerdar et al. [5] said that entropy-based measures are often used to evaluate objective picture fusion supremacy due to single layer entropy execution and an inconsequential parameter set. In their communication, the authors made available a hypothetical research of picture fusion quality measures based on Tsallis [6] entropy. The goal of this research was to evaluate if the chosen quality measure was capable of providing the necessary performances predicted from a perfect information-based image fusion eminence metric. To assess the Tsallis [6] excellence metric, the study results suggested the use of an image realisation model to generate a closed-form appearance for excellence, with weighted averaging used as a fusion process. Their results show that the Tsallis-based excellence measure deviates from expected performance in terms of response to signal-to-noise ratio discrepancy and the effect of entropy order on the measured excellence indicator. Furthermore, the writers conducted evaluations on genuine photographs, the results of which confirmed conformity with the hypothetical exploration. Lenormand et al. [7] discussed the applications of entropy-based models in the urban environment, noting that describing and quantifying spatial inequalities across the urban background remains a complex and mysterious task that has been aided by the increasing availability of large geolocated catalogues. The outcomes of their investigation showed that the entropy-based measure of spatial distribution is a useful indicator of a region’s socioeconomic status and related factors. Lu et al. [8] said that measures of expectedness in physiological indicators based on entropy metrics have been widely used in the presenting domains of medical valuation and clinical identification. The authors proposed a novel entropy-based pattern knowledge for valuing physiological indicators that combines single spectrum analysis and entropy metrics. Saraiva, P. [9] provided brief and spontaneous summary of Shannon’s [1] entropy, including chosen characteristics, and offered applications of the model in two contrasting viewpoints from its inception: biological variety and a unique research on student mobility. Zhang and Shi [10] emphasized that Shannon’s [1] entropy is a fundamental building block of information theory and an essential component of Machine Learning (ML) approaches. The authors created asymptotic holdings that do not need any conventions on the original propagation, and these features allow for interval estimation and statistical tests with full Shannon entropy. Elgawad et al. [11] applied Shannon’s [1] entropy to statistics, including order statistics and several known distributions. Stoyanov et al. [12] employed maximum entropy technique to develop a unique concept for the M-indeterminacy of disseminations on the positive half-line (Stieltjes case) and then derived some important conclusions. Furthermore, the authors proved how the maximum entropy is connected to the symmetry characteristics and M-indeterminacy Wang et al. [13] and others are among the pioneers who have publicly stated their commitment to studying entropic models. In the following, we discuss the essential perspective of the discrete inaccuracy model as attributed to Kerridge [14].

Assume that an experimenter says that the chance of the result of a random experiment is, despite the true probability being. Then, utilizing certain compelling postulates, Kerridge [14] proved that the inaccuracy of the aforementioned claim is dictated by the later numerical exterior:

(1.4)

Sathar et al. [15] explored the prior inaccuracy model and therefore presented nonparametric estimators for these models. The authors carefully explored the asymptotic behaviour of these estimators under varied appropriateness and reliability requirements. Additionally, the authors examined the performance of the projected estimators using the Monte-Carlo simulation technique. Nath [16] also suggested the non-additive inaccuracy model predetermined by the following manifestation:

(1.5)

where , , an integer.

Different instigators proposed novel inaccuracy models because of their application in statistics, coding theory, and other related subjects. Kapur [17], Molloy and Ford [18], and others have made contributions to the characterisation and implementation of inaccuracy models.

The appreciation of entropic models has been investigated as measures of the amount of information for a given probability distribution; it is customary to scrutinize such models in order to assess the amount of information shared between two probability distributions in the sense of how close the two distributions are to one another. Such measurements are known as distance models, and the perception of distance is one of the most essential and fundamental ideas in the various applications of information theory, which has substantial implications in a wide variety of mathematical sciences domains.

Another opinion on the importance of such metrics is that several efforts have been made to broaden the idea of distance in domains other than mathematics. Disciplines such as economics, sociology, psychology, linguistics, genetics, and biology may all benefit from distance measurement. However, distance in such circumstances is not always geometrical, hence there is a need for modification when it comes to distance in probability spaces. Here, we emphasize the need of making adjustments while considering the distance model in probability space. To overcome these limitations, several novel divergence metrics have been proposed and defined. The most important and practically constructive divergence model is attributed to Kullback and Leibler [19], as defined by the following formula.:

(1.6)

The following generalized parametric models of directed divergence and equipped them with applications in the disciplines of statistics and operations research.

(1.7)(1.8)

where is a real parameter.

(1.9)

Parkash and Kakkar [20] developed the following extended parametric models of directed divergence and used them in the area of coding theory.

(1.10)(1.11)(1.12)

Pronzato et al. [21], Kumari and Sharma [22], Torra et al. [23], Khalaj et al. [24], Dwivedi et al. [25], Nielsen [26], Fukumizu [27], Li et al. [28], and others have all contributed to the suggestions, characterizations, generalizations, and implementations of divergence models. In the sequel, we illustrate some novel implications of divergence models. To satisfy our aim, we need the following standard findings from Ash [29] and Cover and Thomas [30]:

Research gap and Motivation

Although many divergence measures — especially the Kullback-Leibler and Rényi divergences — have received a great deal of attention, the current work tends to focus along the lines of properties or special cases of the divergences rather than an unified perspective on their relations through inequalities. The lack of cohesive inequalities linking these variables constrains their comparative analysis and practical interpretation across various probability models. Driven by this need, the current study formulates novel inequalities that generalize and expand existing divergence functions, hence providing enhanced understanding of the architecture of discrete probability spaces. This paper delineates new correlations among generalized divergence measures, offers a formal demonstration of the monotonic behaviour of Rényi divergence via inequality (2.7), and presents a symmetric divergence measure with enhanced analytical features. These findings jointly augment the theoretical foundation of divergence-based modelling and bolster its prospective applications in communication theory, coding systems, and statistical inference.

The impetus for creating new inequalities among these divergence metrics stems from their capacity to provide more precise limits and enhanced performance estimates for practical applications, including source coding, sensor data fusion, and statistical inference. Establishing these links connects abstract mathematical analysis to measurable improvements in practical systems.

Unless otherwise stated, throughout the next portions of the article, We will assume that and and are positive real values.

Theorem 1.1. With the above professed conventions, the subsequent inequality grips:

(1.13)

for all integers . If , then (1.13) holds only as an equality.

Theorem 1.2. With the above professed conventions and for if , then we attain the successive in equation:

(1.14)

with the sign of equality iff .

Theorem 1.3. For , , the subsequent inequality holds good:

(1.15)

for , and (1.15) holds except when .

Theorem 1.4. With the above professed conventions and for , the succeeding inequalities grip:

(1.16)

and

(1.17)

except when .

Theorem 1.5. With the above professed conventions and for a given integer, the subsequent inequality clutches good:

(1.18)

If , then (1.18) clutches as an equivalence. If , then the insignia of equivalence in (1.18) grips only for the equivalence in

Various inequalities and other communications associated with divergence models

In this subdivision, we derive some inequalities and other communications concerning positive real numbers and deliberate their significance in relation to some divergence models.

Let , , an integer.

Definition 2.1. A function , an integer, is supposed to be a divergence function if, for all , ,

(2.1)

and

(2.2)

If, in addition, we have the subsequent manifestation:

(2.3)

then is said to be a symmetric divergence function.

Example 2.2. Choose

where , an integer; is demarcated by the subsequent expression:

(2.4)

This divergence function is acknowledged as Kullback-Liebler’s [19] divergence function.

Choosing , , , , an integer. If , then . If , then by using Theorem 1.1, it follows that with the insignia of equivalence iff .

Consequently,

For , it has numerous interesting interpretations: Firstly, it is acknowledged as the distance of one probability distribution from another probability distribution . Secondly, it is interpreted as a measure of inaccuracy of error due to Kerridge [14]. Thirdly, it is interpreted as a measure of inefficiency due to Cover and Thomas [30] of assuming the probability distribution when, in representativeness, the true probability distribution is . Fourthly, it is likewise entitled the Shannon’s [1] information gain when the probability distribution is replaced by the probability distribution .

Definition 2.3. With the above professed conventions and for an integer, if , then the directed divergence denoted by of the sequence from the sequence is demarcated by the succeeding manifestation:

(2.5)

Making practice of Theorem 1.2, it shadows that

with the emblem of equivalence iff .

Accordingly appears to be an appropriate generalization of when , .

Now, Renyi ‘s [2] divergence models is specified by the succeeding expression:

(2.6)

The measure is entitled as the information gain of order , , , when the probability distribution is replaced by the probability distribution , an integer. We observe that

which is an expression termed as Shannon’s [1] information gain.

Consequently, may possibly be regarded as the information gain of order 1 and accordingly, it may be written as depending upon the situation.

Renyi [2] investigated and ascertained that with the emblem of equivalence iff an integer. We contribute with an alternative proof of this consequence. Keeping in understanding equation (1.15), it is sufficient to confirm that iff.

If ; then .

Now suppose that . Then .

Also, by Theorem 1.1, iff , that is, , as , .

Since and it follows that . Hence .

Proposition 2.4. Let , be a specified real constant and with the above professed conventions, for an integer, the succeeding inequality is continuously accurate:

(2.7)

If , then (2.7) holds as an equality. If , then the emblem of equivalence in (2.7) holds iff .

Proof. If , then the insignia of equivalence clamps in (2.7) as both sides of it reduce to .

Now consider . In this case, by Theorem 2.7, we acquire

(2.8)

with the insignia of equivalence in (2.8) iff or equivalently, as , . The inequality (2.7) follows from (2.8).

Now, we deliver arguments for the importance of (2.7) in the field of information theory.

For all a prearranged integer, we demonstrate that is a non-decreasing function of , . Indeed,

(2.9)

If in equation (2.7), we take such that then the term within brackets on the right hand side of (2.9) is a nonnegative real number.

Consequently, . Hence, is a non-decreasing function of .

If there happen indices and , such that , then strict inequality holds in (2.7). Consequently, if for some indices and , , then the term within brackets on the right hand side of (2.9) is a positive real number.

Accordingly, . Hence is a strictly monotonic increasing function of and as usual, it can certainly be ascertained that

(2.10)

and

(2.11)

Definition 2.5. Through the above professed conventions and for an integer, if the inequality clutches good, then the directed divergence of order , denoted by of the sequence from is demarcated subsequently:

(2.12)

Consider the case . In this case, by assumption, . Now from equation (2.12), we acquire the subsequent communication:

v with the insignia of equivalence iff

Suppose . Making practice of the conjecture , the equations (2.25) and (2.26), we achieve the succeeding manifestation:

unless

Now we demonstrate that iff .

If , then follows from (4.12).

Now suppose . Replacing by in (1.18) and using the fact that , we obtain the subsequent appearance:

(2.13)

Consider the case . Upon employing the equations (2.5), (2.11), (2.12), (2.13)and the assumption we acquire the consequent exterior:

(2.14)

Since , equation (2.14) contributes with the subsequent appearance:

and .

Hence and . Consequently

The case can be discussed similarly. Consequently, we have demonstrated that with the insignia of impartiality iff

It is observed that all the measures and are additive.

Applications to shannon entropy and inaccuracy measures

Let us write

(2.15)

Motivated by this development, we may express some more measures of directed divergence Consider

(2.16)

where Since the right hand side of (2.16) may be a negative real number, does not seem to be a satisfactory measure of directed divergence.

Now, consider the subsequent manifestation:

(2.17)

where , . Here, too, the right hand side of (2.17) may be a negative real number and consequently, also does not seem to be satisfactory measure of directed divergence.

Let, a given integer be a measure of directed divergence.

Define as

Then is a symmetric measure of directed divergence satisfying (2.1), (2.2) and (2.3).

Now define as

(2.18)

Then is also a symmetric measure of directed divergence between the probability distributions , . An important situation arises if we choose where is the Kullback-Liebler’s [19] divergence defined by (2.4). In this case, writing in place of , we have

Symmetric divergence measures

(2.19)

The symmetric measure is usually called J-divergence [31]. It can be written as

(2.20)

Making use of (2.15), the J-divergence can be written as

(2.21)

For the sake of notational convenience, we write as and also write (2.21) as

(2.22)

Corresponding to (2.22), Nath [16,32] developed the subsequent manifestation:

(2.23)

where , . If , then . Now consider .

It is obvious that unless , .

If , ; then .

Consequently,

Now suppose . We prove that , . The case is trivial. We restrict our discussion to the case when . In this case,

It stretches the subsequent communication:

Accordingly, we must have .

But . Consequently, . Since the right hand side of (2.23) is symmetric in and ; it follows that

Hence is a symmetric measure of directed divergence between the probability distributions , .

Practical example: (Application to image fusion — numerical illustration)

Consider a simple image-fusion scenario where two source images () and () produce discrete pixel-intensity histograms that are modelled by probability vectors

Here P and Q approximate pixel intensity distributions from different sensors or bands. We compute standard divergence measures between P and Q to illustrate how the inequalities and monotonicity results in this paper apply in practice.

1. Kullback–Leibler divergence (natural logarithm, “nats”):

Numerical values (computed term-by-term) give

hence the symmetric J-divergence

2. Rényi divergence of order (we use the conventional definition)

Monotonicity check and interpretation

For these P and Q we observe

which illustrates the monotonicity of the Rényi divergence in (the inequalities derived in the text). In the image-fusion context a small divergence (all values are small here) indicates that the fused image will retain characteristics of the original images with relatively little information loss; thus the inequalities and bounds we derived provide a quantitative basis for comparing fusion strategies and for choosing parameters in fusion algorithms.

Per-bin computation (illustrative terms)

Selected per-bin terms (rounded) used in the KL and Rényi computations:

These per-bin terms sum to the KL and the internal sums used in Rényi computations; shown here to document and reproduce the numbers

To continue further, our prerequisite is the subsequent consequence:

Result 2.9. If and be positive and unequal real numbers, then the succeeding inequalities are continuously correct:

(2.24)(2.25)

The insignia of egalitarianism in, (2.24) and (2.25), grips when or or .

Corresponding to (2.22), let us contemplate the succeeding communication:

(2.26)

where , .

Making practice of (1.3) and (1.5), (1.17), we acquire the succeeding manifestation:

(2.27)

where , . If , then (2.27) gives . Now suppose . Consider any index , . If , then

for all , . If , , then by equations (2.24) and (2.25), we achieve the subsequent inequations:

if

if .

Henceforth, we acquire the succeeding communications:

if

if .

Also if and if .

Consequently, we acquire

for all , .

Thus, for any index , , for all ,

Therefore this corroborates the consequence.

Now we prove that iff . Indeed, from the above comprehensive conversation, it is obvious that if ; then . Now suppose that . We shall prove that . The case is trivial.

Now, we consider . From the above wide-spread arguments, it is clear that , is possible when , that is, . Making use of Result 2.9, this is possible only when , as and . Since the right hand side of (2.27) is symmetric in and ; it follows that

Consequently, for all , , is a symmetric measure of directed divergence.

Conclusion

Inequalities in information theory are supposed to be superfluous pillars in the realm of communication and data dispensation. Their prominence lies not only in their mathematical sophistication but likewise in their real-world solicitations across miscellaneous disciplines. As we traverse a period of speedy technological progression and accumulative dependence on interconnected classifications, inequalities in information theory make available a compass for manipulative communication infrastructures that can withstand the complexities of the contemporary world. By addressing the encounters pretended by noise, restricted bandwidth, and security apprehensions; these inequalities pave the approach for tough and proficient information conversation, eventually influencing the forthcoming landscape of communication machineries. The inequalities, theorems, definitions, and divergence-related communications presented in this study are useful in the field of information theory. All inequalities and related communications those are successful for positive real discovered by commissioning more discrete divergence models.

References

  1. 1. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423.
  2. 2. Renyi A. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability. 1961. 547–61.
  3. 3. Havrada JH, Charvat F. Quantification methods of classification process: Concept of structural α-entropy. Kybernetika. 1967;3:30–5.
  4. 4. Huang W, Zhang K. Approximations of shannon mutual information for discrete variables with applications to neural population coding. Entropy (Basel). 2019;21(3):243. pmid:33266958
  5. 5. Sholehkerdar A, Tavakoli J, Liu Z. Theoretical analysis of Tsallis entropy-based quality measure for weighted averaging image fusion. Inf Fusion. 2020;58:69–81.
  6. 6. Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys. 1988;52:479–87.
  7. 7. Lenormand M, Samaniego H, Chaves JC, da Fonseca Vieira V, da Silva MAHB, Evsukoff AG. Entropy as a measure of attractiveness and socioeconomic complexity in rio de janeiro metropolitan area. Entropy (Basel). 2020;22(3):368. pmid:33286142
  8. 8. Lu Y, Wang M, Wu W, Zhang Q, Han Y, Kausar T, et al. Entropy-based pattern learning based on singular spectrum analysis components for assessment of physiological signals. Complexity. 2020;2020:1–17.
  9. 9. Saraiva P. On Shannon entropy and its applications. Kuwait J Sci. 2023;50(3):194–9.
  10. 10. Zhang J, Shi J. Asymptotic normality for plug-in estimators of generalized shannon’s entropy. Entropy (Basel). 2022;24(5):683. pmid:35626567
  11. 11. Abd Elgawad MA, Barakat HM, Xiong S, Alyami SA. Information measures for generalized order statistics and their concomitants under general framework from Huang-Kotz FGM bivariate distribution. Entropy (Basel). 2021;23(3):335. pmid:33809021
  12. 12. Stoyanov JM, Tagliani A, Novi Inverardi PL. Maximum entropy criterion for moment indeterminacy of probability densities. Entropy (Basel). 2024;26(2):121. pmid:38392376
  13. 13. Wang Z, Yue H, Deng J. An uncertainty measure based on lower and upper approximations for generalized rough set models. Fund Inform. 2019;166(3):273–96.
  14. 14. Kerridge DF. Inaccuracy and Inference. J R Stat Soc Series B: Statistical Methodol. 1961;23(1):184–94.
  15. 15. Abdul Sathar EI, Viswakala KV, Rajesh G. Estimation of past inaccuracy measure for the right censored dependent data. Commun Stat Theory Methods. 2019;50(6):1446–55.
  16. 16. Nath P. On the measures of errors in information. J Math Sci. 1968;3(1):1–16.
  17. 17. Kapur JN. On the range of validity of certain measures of inaccuracy. Math Today. 1987;5:57–62.
  18. 18. Molloy TL, Ford JJ. Towards strongly consistent online HMM parameter estimation using one-step Kerridge inaccuracy. Signal Processing. 2015;115:79–93.
  19. 19. Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:79–86.
  20. 20. Parkash O, Kakkar P. New information theoretic models, their detailed properties and new inequalities. Can J Pure Appl Sci. 2014;8(3):3115–23.
  21. 21. Pronzato L, Wynn HP, Zhigljavsky A. Bregman divergences based on optimal design criteria and simplicial measures of dispersion. Stat Pap. 2019;60(2):195–214.
  22. 22. Kumari R, Sharma DK. Generalized ‘useful’ non-symmetric divergence measures and inequalities. J Math Inequal. 2019;13(2):451–66.
  23. 23. Torra V, Narukawa Y, Sugeno M. On the f-divergence for discrete non-additive measures. Inf Sci. 2020;512:50–63.
  24. 24. Khalaj M, Tavakkoli-Moghaddam R, Khalaj F, Siadat A. New definition of the cross entropy based on the Dempster-Shafer theory and its application in a decision-making process. Commun Stat Theory Methods. 2019;49(4):909–23.
  25. 25. Dwivedi A, Wang S, Tajer A. Discriminant analysis under f-divergence measures. Entropy (Basel). 2022;24(2):188. pmid:35205483
  26. 26. Nielsen F. Statistical divergences between densities of truncated exponential families with nested supports: Duo bregman and duo jensen divergences. Entropy (Basel). 2022;24(3):421. pmid:35327931
  27. 27. Fukumizu K. Estimation with infinite-dimensional exponential family and Fisher divergence. Inf Geom. 2024;7:609–22.
  28. 28. Li H, Yang Z, Tu F, Deng L, Han Y, Fu X, et al. Mutation divergence over space in tumour expansion. J R Soc Interface. 2023;20(208):20230542. pmid:37989227
  29. 29. Ash R. Information Theory. New York: Interscience Publishers. 1965.
  30. 30. Cover TM, Thomas JA. Elements of Information Theory. New Delhi: Wiley India (P.) Ltd; 1999.
  31. 31. Jeffreys H. Theory of Probability. Oxford: Clarendon Press. 1948.
  32. 32. Nath P. Remarks on some measures of inaccuracy of finite discrete generalized probability distributions, entropy and ergodic theory. Selecta Stat Canad. 1974;2:77–100.