Figures
Abstract
The analysis of inequalities aids in the formulation of optimal coding schemes that improve the rate of information transfer while reducing the probability of errors. This consequently implies that we have a major impact on having solid and secured correspondence systems for applications running from broadcast communications to information pressure. Information theory inequalities constitute a theoretical toolbox for designing channels that can overcome a challenging environment, allowing information to be held and communicated reliably and securely in a world that is becoming ever more interconnected. The paper introduces a general divergence model in general probability spaces and extends known information-theoretic inequalities and results based on variational models. We have built various inequalities for finite sequences of positive real numbers, the specific cases of which are important in information theory, especially in connection with several divergence models that remain in the literature. Additionally, we have derived certain other important communications concerning positive real numbers in relation to some divergence models.
Citation: Singh V, Sharma SK, Parkash O, Bin-Asfour M (2026) Numerous inequalities and related communications accompanying discrete divergence models in probability spaces. PLoS One 21(2): e0341742. https://doi.org/10.1371/journal.pone.0341742
Editor: Fucai Lin, Minnan Normal University, CHINA
Received: May 20, 2025; Accepted: January 11, 2026; Published: February 13, 2026
Copyright: © 2026 Singh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: The author extends the appreciation to the Deanship of Postgraduate Studies and Scientific Research at Majmaah University for funding this research work through the (Project Number R-2026-39).
Competing interests: The authors have declared that no competing interests exist. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Introduction
The well-known validity of coding theory allows for the examination of code combinations using discrete probabilistic entropic models and supports applications in many domains. Shannon [1] developed the concept of entropy within discrete probability spaces, forming the foundation of information theory. Shannon’s widely accepted view of probabilistic entropy [1] expanded the coding theory literature by introducing several entropic models. This established process laid the groundwork for a discrete entropic model with unexpectedly favourable features.
We have the notion that is the assemblage of all unconnected possibility distributions with nonnegative members and complete support on a set of cardinality n and
. A non-degenerate possibility distribution
is thought to be a probability distribution. In many cases, transactions must be delivered using discrete probability distributions, each of which contains a positive real value. Consequently, we define the following sets:
For every probability distribution, we give various current entropic models:
The Shannon [1] entropy:
The Renyi [2] entropy:
The Havrda-Charvat [3] entropy:
Huang and Zhang [4] provided a new interpretation of Shannon’s [1] mutual information, noting that it is often difficult to compute for uncountable sample spaces. As a result, the authors observed that the asymptotic modus operandi pedestal on Fisher information often provides spectacular estimates to such information, and they mirrored the estimates pedestal on particular divergence models in the probability spaces. Furthermore, the authors performed numerical repetition and demonstrated that their predicted mode of operation was incredibly wonderful, with increasing ease of application to a wide range of real and hypothetical issues. Furthermore, discrete entropy models have a wide range of applications. Sholehkerdar et al. [5] said that entropy-based measures are often used to evaluate objective picture fusion supremacy due to single layer entropy execution and an inconsequential parameter set. In their communication, the authors made available a hypothetical research of picture fusion quality measures based on Tsallis [6] entropy. The goal of this research was to evaluate if the chosen quality measure was capable of providing the necessary performances predicted from a perfect information-based image fusion eminence metric. To assess the Tsallis [6] excellence metric, the study results suggested the use of an image realisation model to generate a closed-form appearance for excellence, with weighted averaging used as a fusion process. Their results show that the Tsallis-based excellence measure deviates from expected performance in terms of response to signal-to-noise ratio discrepancy and the effect of entropy order on the measured excellence indicator. Furthermore, the writers conducted evaluations on genuine photographs, the results of which confirmed conformity with the hypothetical exploration. Lenormand et al. [7] discussed the applications of entropy-based models in the urban environment, noting that describing and quantifying spatial inequalities across the urban background remains a complex and mysterious task that has been aided by the increasing availability of large geolocated catalogues. The outcomes of their investigation showed that the entropy-based measure of spatial distribution is a useful indicator of a region’s socioeconomic status and related factors. Lu et al. [8] said that measures of expectedness in physiological indicators based on entropy metrics have been widely used in the presenting domains of medical valuation and clinical identification. The authors proposed a novel entropy-based pattern knowledge for valuing physiological indicators that combines single spectrum analysis and entropy metrics. Saraiva, P. [9] provided brief and spontaneous summary of Shannon’s [1] entropy, including chosen characteristics, and offered applications of the model in two contrasting viewpoints from its inception: biological variety and a unique research on student mobility. Zhang and Shi [10] emphasized that Shannon’s [1] entropy is a fundamental building block of information theory and an essential component of Machine Learning (ML) approaches. The authors created asymptotic holdings that do not need any conventions on the original propagation, and these features allow for interval estimation and statistical tests with full Shannon entropy. Elgawad et al. [11] applied Shannon’s [1] entropy to statistics, including order statistics and several known distributions. Stoyanov et al. [12] employed maximum entropy technique to develop a unique concept for the M-indeterminacy of disseminations on the positive half-line (Stieltjes case) and then derived some important conclusions. Furthermore, the authors proved how the maximum entropy is connected to the symmetry characteristics and M-indeterminacy Wang et al. [13] and others are among the pioneers who have publicly stated their commitment to studying entropic models. In the following, we discuss the essential perspective of the discrete inaccuracy model as attributed to Kerridge [14].
Assume that an experimenter says that the chance of the result of a random experiment is, despite the true probability being. Then, utilizing certain compelling postulates, Kerridge [14] proved that the inaccuracy of the aforementioned claim is dictated by the later numerical exterior:
Sathar et al. [15] explored the prior inaccuracy model and therefore presented nonparametric estimators for these models. The authors carefully explored the asymptotic behaviour of these estimators under varied appropriateness and reliability requirements. Additionally, the authors examined the performance of the projected estimators using the Monte-Carlo simulation technique. Nath [16] also suggested the non-additive inaccuracy model predetermined by the following manifestation:
where ,
,
an integer.
Different instigators proposed novel inaccuracy models because of their application in statistics, coding theory, and other related subjects. Kapur [17], Molloy and Ford [18], and others have made contributions to the characterisation and implementation of inaccuracy models.
The appreciation of entropic models has been investigated as measures of the amount of information for a given probability distribution; it is customary to scrutinize such models in order to assess the amount of information shared between two probability distributions in the sense of how close the two distributions are to one another. Such measurements are known as distance models, and the perception of distance is one of the most essential and fundamental ideas in the various applications of information theory, which has substantial implications in a wide variety of mathematical sciences domains.
Another opinion on the importance of such metrics is that several efforts have been made to broaden the idea of distance in domains other than mathematics. Disciplines such as economics, sociology, psychology, linguistics, genetics, and biology may all benefit from distance measurement. However, distance in such circumstances is not always geometrical, hence there is a need for modification when it comes to distance in probability spaces. Here, we emphasize the need of making adjustments while considering the distance model in probability space. To overcome these limitations, several novel divergence metrics have been proposed and defined. The most important and practically constructive divergence model is attributed to Kullback and Leibler [19], as defined by the following formula.:
The following generalized parametric models of directed divergence and equipped them with applications in the disciplines of statistics and operations research.
where is a real parameter.
Parkash and Kakkar [20] developed the following extended parametric models of directed divergence and used them in the area of coding theory.
Pronzato et al. [21], Kumari and Sharma [22], Torra et al. [23], Khalaj et al. [24], Dwivedi et al. [25], Nielsen [26], Fukumizu [27], Li et al. [28], and others have all contributed to the suggestions, characterizations, generalizations, and implementations of divergence models. In the sequel, we illustrate some novel implications of divergence models. To satisfy our aim, we need the following standard findings from Ash [29] and Cover and Thomas [30]:
Research gap and Motivation
Although many divergence measures — especially the Kullback-Leibler and Rényi divergences — have received a great deal of attention, the current work tends to focus along the lines of properties or special cases of the divergences rather than an unified perspective on their relations through inequalities. The lack of cohesive inequalities linking these variables constrains their comparative analysis and practical interpretation across various probability models. Driven by this need, the current study formulates novel inequalities that generalize and expand existing divergence functions, hence providing enhanced understanding of the architecture of discrete probability spaces. This paper delineates new correlations among generalized divergence measures, offers a formal demonstration of the monotonic behaviour of Rényi divergence via inequality (2.7), and presents a symmetric divergence measure with enhanced analytical features. These findings jointly augment the theoretical foundation of divergence-based modelling and bolster its prospective applications in communication theory, coding systems, and statistical inference.
The impetus for creating new inequalities among these divergence metrics stems from their capacity to provide more precise limits and enhanced performance estimates for practical applications, including source coding, sensor data fusion, and statistical inference. Establishing these links connects abstract mathematical analysis to measurable improvements in practical systems.
Unless otherwise stated, throughout the next portions of the article, We will assume that
and
and are positive real values.
Theorem 1.1. With the above professed conventions, the subsequent inequality grips:
for all integers . If
, then (1.13) holds only as an equality.
Theorem 1.2. With the above professed conventions and for if
, then we attain the successive in equation:
with the sign of equality iff .
Theorem 1.3. For ,
, the subsequent inequality holds good:
for ,
and (1.15) holds except when
.
Theorem 1.4. With the above professed conventions and for , the succeeding inequalities grip:
and
except when .
Theorem 1.5. With the above professed conventions and for a given integer, the subsequent inequality clutches good:
If , then (1.18) clutches as an equivalence. If
, then the insignia of equivalence in (1.18) grips only for the equivalence in
Various inequalities and other communications associated with divergence models
In this subdivision, we derive some inequalities and other communications concerning positive real numbers and deliberate their significance in relation to some divergence models.
Let ,
,
an integer.
Definition 2.1. A function ,
an integer, is supposed to be a divergence function if, for all
,
,
and
If, in addition, we have the subsequent manifestation:
then is said to be a symmetric divergence function.
Example 2.2. Choose
where ,
an integer; is demarcated by the subsequent expression:
This divergence function is acknowledged as Kullback-Liebler’s [19] divergence function.
Choosing ,
,
,
,
an integer. If
, then
. If
, then by using Theorem 1.1, it follows that
with the insignia of equivalence iff
.
Consequently,
For , it has numerous interesting interpretations: Firstly, it is acknowledged as the distance of one probability distribution
from another probability distribution
. Secondly, it is interpreted as a measure of inaccuracy of error due to Kerridge [14]. Thirdly, it is interpreted as a measure of inefficiency due to Cover and Thomas [30] of assuming the probability distribution
when, in representativeness, the true probability distribution is
. Fourthly, it is likewise entitled the Shannon’s [1] information gain when the probability distribution
is replaced by the probability distribution
.
Definition 2.3. With the above professed conventions and for an integer, if
, then the directed divergence denoted by
of the sequence
from the sequence
is demarcated by the succeeding manifestation:
Making practice of Theorem 1.2, it shadows that
with the emblem of equivalence iff
.
Accordingly appears to be an appropriate generalization of
when
,
.
Now, Renyi ‘s [2] divergence models is specified by the succeeding expression:
The measure is entitled as the information gain of order
,
,
, when the probability distribution
is replaced by the probability distribution
,
an integer. We observe that
which is an expression termed as Shannon’s [1] information gain.
Consequently, may possibly be regarded as the information gain of order 1 and accordingly, it may be written as
depending upon the situation.
Renyi [2] investigated and ascertained that with the emblem of equivalence iff
an integer. We contribute with an alternative proof of this consequence. Keeping in understanding equation (1.15), it is sufficient to confirm that
iff
.
If
; then
.
Now suppose that . Then
.
Also, by Theorem 1.1, iff
, that is,
, as
,
.
Since and
it follows that
. Hence
.
Proposition 2.4. Let ,
be a specified real constant and with the above professed conventions, for
an integer, the succeeding inequality is continuously accurate:
If , then (2.7) holds as an equality. If
, then the emblem of equivalence in (2.7) holds iff
.
Proof. If , then the insignia of equivalence clamps in (2.7) as both sides of it reduce to
.
Now consider . In this case, by Theorem 2.7, we acquire
with the insignia of equivalence in (2.8) iff or equivalently,
as
,
. The inequality (2.7) follows from (2.8).
Now, we deliver arguments for the importance of (2.7) in the field of information theory.
For all
a prearranged integer, we demonstrate that
is a non-decreasing function of
,
. Indeed,
If in equation (2.7), we take
such that
then the term within brackets on the right hand side of (2.9) is a nonnegative real number.
Consequently, . Hence,
is a non-decreasing function of
.
If there happen indices and
, such that
, then strict inequality holds in (2.7). Consequently, if
for some indices
and
,
, then the term within brackets on the right hand side of (2.9) is a positive real number.
Accordingly, . Hence
is a strictly monotonic increasing function of
and as usual, it can certainly be ascertained that
and
Definition 2.5. Through the above professed conventions and for an integer, if the inequality
clutches good, then the directed divergence of order
, denoted by
of the sequence
from
is demarcated subsequently:
Consider the case . In this case, by assumption,
. Now from equation (2.12), we acquire the subsequent communication:
v with the insignia of equivalence iff
Suppose . Making practice of the conjecture
, the equations (2.25) and (2.26), we achieve the succeeding manifestation:
unless
Now we demonstrate that iff
.
If , then
follows from (4.12).
Now suppose . Replacing
by
in (1.18) and using the fact that
, we obtain the subsequent appearance:
Consider the case . Upon employing the equations (2.5), (2.11), (2.12), (2.13)and the assumption
we acquire the consequent exterior:
Since , equation (2.14) contributes with the subsequent appearance:
and
.
Hence and
. Consequently
The case can be discussed similarly. Consequently, we have demonstrated that
with the insignia of impartiality iff
It is observed that all the measures
and
are additive.
Applications to shannon entropy and inaccuracy measures
Let us write
Motivated by this development, we may express some more measures of directed divergence Consider
where
Since the right hand side of (2.16) may be a negative real number,
does not seem to be a satisfactory measure of directed divergence.
Now, consider the subsequent manifestation:
where ,
. Here, too, the right hand side of (2.17) may be a negative real number and consequently,
also does not seem to be satisfactory measure of directed divergence.
Let,
a given integer be a measure of directed divergence.
Define as
Then is a symmetric measure of directed divergence satisfying (2.1), (2.2) and (2.3).
Now define as
Then is also a symmetric measure of directed divergence between the probability distributions
,
. An important situation arises if we choose
where
is the Kullback-Liebler’s [19] divergence defined by (2.4). In this case, writing
in place of
, we have
Symmetric divergence measures
The symmetric measure is usually called J-divergence [31]. It can be written as
Making use of (2.15), the J-divergence can be written as
For the sake of notational convenience, we write as
and also write (2.21) as
Corresponding to (2.22), Nath [16,32] developed the subsequent manifestation:
where ,
. If
, then
. Now consider
.
It is obvious that unless
,
.
If ,
; then
.
Consequently,
Now suppose . We prove that
,
. The case
is trivial. We restrict our discussion to the case when
. In this case,
It stretches the subsequent communication:
Accordingly, we must have .
But . Consequently,
. Since the right hand side of (2.23) is symmetric in
and
; it follows that
Hence is a symmetric measure of directed divergence between the probability distributions
,
.
Practical example: (Application to image fusion — numerical illustration)
Consider a simple image-fusion scenario where two source images () and (
) produce discrete pixel-intensity histograms that are modelled by probability vectors
Here P and Q approximate pixel intensity distributions from different sensors or bands. We compute standard divergence measures between P and Q to illustrate how the inequalities and monotonicity results in this paper apply in practice.
1. Kullback–Leibler divergence (natural logarithm, “nats”):
Numerical values (computed term-by-term) give
hence the symmetric J-divergence
2. Rényi divergence of order (we use the conventional definition)
Monotonicity check and interpretation
For these P and Q we observe
which illustrates the monotonicity of the Rényi divergence in (the inequalities derived in the text). In the image-fusion context a small divergence (all values are small here) indicates that the fused image will retain characteristics of the original images with relatively little information loss; thus the inequalities and bounds we derived provide a quantitative basis for comparing fusion strategies and for choosing parameters in fusion algorithms.
Per-bin computation (illustrative terms)
Selected per-bin terms (rounded) used in the KL and Rényi computations:
These per-bin terms sum to the KL and the internal sums used in Rényi computations; shown here to document and reproduce the numbers
To continue further, our prerequisite is the subsequent consequence:
Result 2.9. If and
be positive and unequal real numbers, then the succeeding inequalities are continuously correct:
The insignia of egalitarianism in, (2.24) and (2.25), grips when or
or
.
Corresponding to (2.22), let us contemplate the succeeding communication:
where ,
.
Making practice of (1.3) and (1.5), (1.17), we acquire the succeeding manifestation:
where ,
. If
, then (2.27) gives
. Now suppose
. Consider any index
,
. If
, then
for all
,
. If
,
, then by equations (2.24) and (2.25), we achieve the subsequent inequations:
if
if
.
Henceforth, we acquire the succeeding communications:
if
if
.
Also if
and
if
.
Consequently, we acquire
for all
,
.
Thus, for any index ,
,
for all
,
Therefore this corroborates the consequence.
Now we prove that iff
. Indeed, from the above comprehensive conversation, it is obvious that if
; then
. Now suppose that
. We shall prove that
. The case
is trivial.
Now, we consider . From the above wide-spread arguments, it is clear that
, is possible when
, that is,
. Making use of Result 2.9, this is possible only when
, as
and
. Since the right hand side of (2.27) is symmetric in
and
; it follows that
Consequently, for all ,
,
is a symmetric measure of directed divergence.
Conclusion
Inequalities in information theory are supposed to be superfluous pillars in the realm of communication and data dispensation. Their prominence lies not only in their mathematical sophistication but likewise in their real-world solicitations across miscellaneous disciplines. As we traverse a period of speedy technological progression and accumulative dependence on interconnected classifications, inequalities in information theory make available a compass for manipulative communication infrastructures that can withstand the complexities of the contemporary world. By addressing the encounters pretended by noise, restricted bandwidth, and security apprehensions; these inequalities pave the approach for tough and proficient information conversation, eventually influencing the forthcoming landscape of communication machineries. The inequalities, theorems, definitions, and divergence-related communications presented in this study are useful in the field of information theory. All inequalities and related communications those are successful for positive real discovered by commissioning more discrete divergence models.
References
- 1. Shannon CE. A mathematical theory of communication. Bell Syst Tech J. 1948;27(3):379–423.
- 2. Renyi A. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability. 1961. 547–61.
- 3. Havrada JH, Charvat F. Quantification methods of classification process: Concept of structural α-entropy. Kybernetika. 1967;3:30–5.
- 4. Huang W, Zhang K. Approximations of shannon mutual information for discrete variables with applications to neural population coding. Entropy (Basel). 2019;21(3):243. pmid:33266958
- 5. Sholehkerdar A, Tavakoli J, Liu Z. Theoretical analysis of Tsallis entropy-based quality measure for weighted averaging image fusion. Inf Fusion. 2020;58:69–81.
- 6. Tsallis C. Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys. 1988;52:479–87.
- 7. Lenormand M, Samaniego H, Chaves JC, da Fonseca Vieira V, da Silva MAHB, Evsukoff AG. Entropy as a measure of attractiveness and socioeconomic complexity in rio de janeiro metropolitan area. Entropy (Basel). 2020;22(3):368. pmid:33286142
- 8. Lu Y, Wang M, Wu W, Zhang Q, Han Y, Kausar T, et al. Entropy-based pattern learning based on singular spectrum analysis components for assessment of physiological signals. Complexity. 2020;2020:1–17.
- 9. Saraiva P. On Shannon entropy and its applications. Kuwait J Sci. 2023;50(3):194–9.
- 10. Zhang J, Shi J. Asymptotic normality for plug-in estimators of generalized shannon’s entropy. Entropy (Basel). 2022;24(5):683. pmid:35626567
- 11. Abd Elgawad MA, Barakat HM, Xiong S, Alyami SA. Information measures for generalized order statistics and their concomitants under general framework from Huang-Kotz FGM bivariate distribution. Entropy (Basel). 2021;23(3):335. pmid:33809021
- 12. Stoyanov JM, Tagliani A, Novi Inverardi PL. Maximum entropy criterion for moment indeterminacy of probability densities. Entropy (Basel). 2024;26(2):121. pmid:38392376
- 13. Wang Z, Yue H, Deng J. An uncertainty measure based on lower and upper approximations for generalized rough set models. Fund Inform. 2019;166(3):273–96.
- 14. Kerridge DF. Inaccuracy and Inference. J R Stat Soc Series B: Statistical Methodol. 1961;23(1):184–94.
- 15. Abdul Sathar EI, Viswakala KV, Rajesh G. Estimation of past inaccuracy measure for the right censored dependent data. Commun Stat Theory Methods. 2019;50(6):1446–55.
- 16. Nath P. On the measures of errors in information. J Math Sci. 1968;3(1):1–16.
- 17. Kapur JN. On the range of validity of certain measures of inaccuracy. Math Today. 1987;5:57–62.
- 18. Molloy TL, Ford JJ. Towards strongly consistent online HMM parameter estimation using one-step Kerridge inaccuracy. Signal Processing. 2015;115:79–93.
- 19. Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat. 1951;22:79–86.
- 20. Parkash O, Kakkar P. New information theoretic models, their detailed properties and new inequalities. Can J Pure Appl Sci. 2014;8(3):3115–23.
- 21. Pronzato L, Wynn HP, Zhigljavsky A. Bregman divergences based on optimal design criteria and simplicial measures of dispersion. Stat Pap. 2019;60(2):195–214.
- 22. Kumari R, Sharma DK. Generalized ‘useful’ non-symmetric divergence measures and inequalities. J Math Inequal. 2019;13(2):451–66.
- 23. Torra V, Narukawa Y, Sugeno M. On the f-divergence for discrete non-additive measures. Inf Sci. 2020;512:50–63.
- 24. Khalaj M, Tavakkoli-Moghaddam R, Khalaj F, Siadat A. New definition of the cross entropy based on the Dempster-Shafer theory and its application in a decision-making process. Commun Stat Theory Methods. 2019;49(4):909–23.
- 25. Dwivedi A, Wang S, Tajer A. Discriminant analysis under f-divergence measures. Entropy (Basel). 2022;24(2):188. pmid:35205483
- 26. Nielsen F. Statistical divergences between densities of truncated exponential families with nested supports: Duo bregman and duo jensen divergences. Entropy (Basel). 2022;24(3):421. pmid:35327931
- 27. Fukumizu K. Estimation with infinite-dimensional exponential family and Fisher divergence. Inf Geom. 2024;7:609–22.
- 28. Li H, Yang Z, Tu F, Deng L, Han Y, Fu X, et al. Mutation divergence over space in tumour expansion. J R Soc Interface. 2023;20(208):20230542. pmid:37989227
- 29.
Ash R. Information Theory. New York: Interscience Publishers. 1965.
- 30.
Cover TM, Thomas JA. Elements of Information Theory. New Delhi: Wiley India (P.) Ltd; 1999.
- 31.
Jeffreys H. Theory of Probability. Oxford: Clarendon Press. 1948.
- 32. Nath P. Remarks on some measures of inaccuracy of finite discrete generalized probability distributions, entropy and ergodic theory. Selecta Stat Canad. 1974;2:77–100.