## Figures

## Abstract

Shannon entropy *H* and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information (“Shannon differentiation”) between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (*Sturnus vulgaris*) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.

**Citation: **Chao A, Jost L, Hsieh TC, Ma KH, Sherwin WB, Rollins LA (2015) Expected Shannon Entropy and Shannon Differentiation between Subpopulations for Neutral Genes under the Finite Island Model. PLoS ONE 10(6):
e0125471.
https://doi.org/10.1371/journal.pone.0125471

**Academic Editor: **Mark D. McDonnell, University of South Australia, AUSTRALIA

**Received: **July 30, 2014; **Accepted: **March 24, 2015; **Published: ** June 11, 2015

**Copyright: ** © 2015 Chao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

**Data Availability: **All relevant data are within the paper and its Supporting Information files.

**Funding: **The Mathematics Research Center (of Taiwan Ministry of Science and Technology), The Population Biology Foundation, and the Ministry of Science and Technology, Taiwan, Contract 100-2118-M007-006-MY3 (http://www.most.gov.tw/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Genetic analysis of populations has nearly always relied on measures based on expected heterozygosities or gene identities [1], because these link to variance and the binary nature of sexual reproduction and diploid inheritance. The corresponding *F*_{ST} measures and their various generalizations for subdivided populations have also played a central role in population genetics and evolutionary biology [2,3,4]. This approach emphasizes the frequent alleles by giving them much more weight than their population fraction, and multi-level hierarchical additive partitioning is not usually possible with heterozygosity-based measures [5–8].

Researchers in various disciplines have increasingly recognized that diversity within populations and compositional differentiation between populations cannot be completely characterized by a single measure. For example, ecologists have reached a consensus [9,10] that instead of one or a few diversity measures, it is best to use a multifaceted diversity measure parameterized by order *q* (which determines the measures’ emphasis on rare or common species), to completely characterize the species abundance distributions in ecological assemblages. By analogy, in addition to measures based on heterozygosity, complementary abundance-sensitive measures that are sensitive to less frequent alleles are needed to portray a more complete picture of allele frequency distribution or differentiation among populations.

This paper mainly focuses on Shannon entropy *H* and its differentiation measures. Shannon entropy *H* and its monotonic transformations, such as exp(*H*), connect directly to the rich mathematics of information theory initiated by Shannon [11], singularly appropriate for DNA information [12,13,14]. Unlike heterozygosity, information measures weigh alleles in proportion to their population fraction. Shannon entropy and its exponential are also the most popular summary statistics for ecological biodiversity [15], so their use in genetics would allow integrated ecological and genetic modeling.

Shannon entropy and its monotonic transformations can be partitioned into independent within- and between-subpopulation components. The between-group component, called mutual information, measures the differentiation of allele proportions between subpopulations as the mean reduction in uncertainty about allele identity when we learn the subpopulation from which the allele was drawn. In measuring compositional differentiation among subpopulations, the between-group component of Shannon entropy obeys stronger monotonicity properties than the between-group component of heterozygosity [8,16] (see Discussion). Mutual information is closely related to entropy-based measures of compositional differentiation among ecological communities [17,18].

Although entropy and mutual information have been widely used in information science and ecology after Shannon [11] and MacArthur [19], they were rarely applied to genetics until recently. Lewontin [20] pioneered the use of entropy and its decomposition in population genetics. Shannon entropy and mutual information have more recently been used to analyze a wide variety of genetic processes and patterns [12,13]. Examples cover a range of taxa, including viruses [21], bacteria [22], protist parasites [23], mosses [24], higher plants [25–31], invertebrates [14,32] and vertebrates including humans [33,34,35]. Many concentrate on microsatellites [12], but they have also assessed AFLPs [29], and single-nucleotide polymorphisms [14]. Recent theoretical uses of Shannon entropy and mutual information in genetics also include: dynamics of populations of genetically variable individuals in landscapes [36]; dynamics of molecules in gene expression networks [37,38,39]; analysis of gene-environment interactions, including genome wide association studies [40–44]; phylogenetic reconstruction [45,46,47]; mapping genes [48,49]; and derivations of classical population genetic results regarding drift and selection [50]. Outside genetics, there is much parallel work in species, phylogenetic and functional diversity involving entropy [51–54], so there may be further opportunities for expansion.

Given all these applications, it is vital to link Shannon entropy and mutual information to neutral genetic models. Previous attempts [12,14,55] fell short of general analytic expressions. For a single isolated population, Sherwin et al. [12] used the diffusion approximation to predict equilibrium Shannon entropy under the infinite allele model (IAM) or stepwise mutation model (SMM). However, these led to slowly-converging infinite series. For two populations connected by dispersal, with SMM, simulation results provided an empirical equation for mutual information at equilibrium, but no analytical equation was obtained [12]. Dewar et al. [14] derived a Taylor approximation to mutual information for bi-allelic genes only. Even with this incomplete armory of methods, Sherwin et al. [12] and Sherwin [13] showed firstly that for analysis of geographic subdivision and genetic exchange between sub-populations, mutual information readily yields an estimate of the dispersal rate per generation, and secondly that compared to all other approaches for analyzing such data, this method is robust to an extraordinarily wide range of dispersal rates and population sizes. The method has been used to assess current and historical subdivision in rainforest trees [25]. Thus mutual information might be more useful than heterozygosity-based measures for genetic estimation of dispersal, as noted by [12,13]. These considerations motivated us to derive analytic formulas for the general case of Shannon entropy and mutual information for genetic data.

Here we report remarkably simple expressions for expected Shannon entropy (and its exponential, “Shannon diversity” or the “effective number of alleles”) of the equilibrium allele distribution at a neutral locus in an isolated population under IAM or SMM. A bridge that connects the two models of mutation is identified. Our formulas and simulations also show for each model a robust relationship between entropy and heterozygosity under neutral models in equilibrium. Simulations show this relationship is often approximately valid even under some non-equilibrium conditions. Thus, the relationship between these two classes of measures may provide a test for neutrality that is relatively robust to violations of equilibrium assumptions.

We generalize this result to find the entropy of subdivided populations that follow the finite island model (FIM), and use the results to predict the mutual information between subpopulations at equilibrium under two models: IAM-FIM (FIM with mutation following IAM) and SMM-FIM (FIM with mutation following SMM). We can thus identify the model parameters that determine mutual information. We apply our measures to common starling (*Sturnus vulgaris*) data collected from their introduced range in Australia, to assess the robustness of the theoretical relationship we have found between entropy and heterozygosity.

## Methods

### Single isolated population under IAM

Assume *N* is the number of diploid individuals in an idealized population, *μ* is the mutation rate per generation, and there are *A* alleles at the target locus, with allele proportions (or fractions) *p*_{1}, *p*_{2},…, *p*_{A}. Throughout the paper, we assume that the population size is sufficiently large so that the distribution of allele proportions is essentially continuous. For non-ideal populations, *N* is replaced by effective population size. Shannon entropy is defined as and heterozygosity is . Here we use the notation ^{1}*H* for Shannon entropy and ^{2}*H* for heterozygosity because these two measures are special case, of order *q* = 1 and *q* = 2 respectively, of the generalized Tsallis or HCDT entropies ^{q}*H* [5,6,7] (see Discussion).

We first seek the expected value of Shannon entropy for neutral alleles under IAM in a single completely isolated population. Using the diffusion approximation, the allele proportion distribution under IAM is approximately Φ(*p*) = *θp*^{−1}(1−*p*)^{θ−1}, thus the equilibrium expectation value of any function ∑_{i} *h*(*p*_{i}), where *h*(*p*_{i}) tends to zero when *p*_{i} approaches zero, is given by the Ewens’ sampling formula [55]
where *θ* = 4*Nμ*. Setting *h*(*p*) = *p*^{2} in the above integral, we obtain the well-known formula for the expected heterozygosity [56]:

Setting *h*(*p*) = –*p* log *p*, we obtain the equilibrium expectation of Shannon entropy [12,55]:

The above can be expressed as an integral of the logarithm function with respect to a beta distribution, so we obtain a simple formula for the expected Shannon entropy as a function of *θ* (see S1 Appendix for details)
(2A)
where *ψ*(*z*) is the digamma function, and ≈ 0.5772 is the famous Euler’s constant. It is remarkable that this complex stochastic system has an entropy expressable as a simple combination of well-known mathematical functions. If *θ* is greater than 2, then *ψ*(*θ*+1) can be accurately approximated by log(*θ*+0.5), so for many practical cases the expected Shannon entropy is approximately a linear function of the logarithm of *θ*:

Substituting Eq 1 into Eq 2A or 2B leads to a direct relationship (or link) between expected Shannon entropy and heterozygosity at equilibrium:

(3A)Shannon entropy (^{1}*H*) and heterozygosity (^{2}*H*), can be transformed into an effective number of alleles (or diversity), ^{1}*D* and ^{2}*D*, which possess useful mathematical properties [8,57,58]. The transformation for heterozygosity is ^{2}*D* = 1/(1−^{2}*H*) = *θ* + 1, which is interpreted as the number of equi-frequent alleles that would give the same heterozygosity as that of the actual population. The transformation for Shannon entropy is ^{1}*D* = exp(^{1}*H*), which is interpreted as the number of equi-frequent alleles that would give the same Shannon entropy as that of the actual population [19,56].

We summarize all results for Shannon entropy (*q* = 1, Eq 2A) and heterozygosity (*q* = 2) in the second column of Table 1. When *θ* is greater than 2, the approximation (Eq 2B) leads to the following linear relationship between the Shannon-entropy-based and heterozygosity-based diversities:

In this regime the Shannon diversity is itself a linear function of *θ*.

### Single isolated population under SMM

Ohta and Kimura developed the framework of SMM, in which each mutation only creates adjacent alleles [59,60,61]. Here we consider the simplest form: the one-phase mutation model in which mutation is always only a single step, e.g. to one more or less repeat in microsatellite DNA. They used a diffusion approximation to obtain the allele proportion distribution:
where *θ* = 4*Nμ*, *α* = [(1+2*θ*)^{1/2}−1]/2, *B*(*x*,*y*) = Γ(*x*)Γ(*y*)/Γ(*x*+*y*) is the beta function, and Γ(*x*) is the gamma function. Their approach in [60] is reviewed in S1 Appendix to provide the necessary background for the generalization to the theory of multiple populations. As implied by their theory and also explained in S1 Appendix, if the parameter *α* tends to 0, then the allele distribution tends to that in IAM. This explicitly bridges between the allele proportion distributions of SMM and IAM, implying all properties derived from allele proportion distributions of the two models can also be connected by this bridge. For example, the expected heterozygosity ^{2}*H* derived by Kimura & Ohta is [60]:
(4A)
When *α* is zero, the above reduces to the expected heterozygosity under IAM (in Eq 1). Using the relationship between *α* and *θ* (*α* = [(1+2*θ*)^{1/2}−1]/2; details in S1 Appendix), we can also express the expected heterozygosity in terms of a function of only *θ*:

From the allele proportion distribution, the expected Shannon entropy for a population in mutation-drift equilibrium under SMM is approximately equal to

Again, this is the negative of an integral of the logarithm function with respect to a beta distribution. We thus have a simple analytic formula for expected Shannon entropy under SMM: (see S1 Appendix for derivation of the following three equations):

(5A)When *α* is zero, the above reduces to the expected Shannon entropy under IAM (Eq 2A). From Eqs 4A, 4B and 5A, we obtain a simple relationship (or link) between ^{1}*H* and ^{2}*H* (see S1 Appendix for details):
(5B)
and between ^{1}*D* and ^{2}*D*:

We summarize all results for Shannon entropy (*q* = 1, Eq 5A) and heterozygosity (*q* = 2, Eqs 4A or 4B) in the second column of Table 1.

### Multiple populations under IAM-FIM

In Wright’s finite island model (FIM) there are *n* idealized subpopulations each with size *N*, mutation rate *μ* per generation, and dispersal (or migration) rate *m* per generation, so that in each generation the alleles of any subpopulation include a proportion *m/*(*n−*1) randomly chosen from each of the other *n−*1 subpopulations. For notational simplicity, we follow Latter [62] and use *m*^{*} = *mn*/(*n*_{—}1) instead of *m*. Note FIM assumes that population size, dispersal rate and mutation rate are all constant across all subpopulations. Spatially homogeneous dispersal is also assumed [63].

As with a single isolated population, the allele proportion *y* for the total population is [64]:
(6)
where *θ*_{T} = 4*N*_{T} *μ* , and *N*_{T} denotes the effective size of the total population *N*_{T} = *Nn*+(*n*−1)/[4(*m**+*μ*)] under IAM-FIM ([65], p. 431). Therefore, all formulas for a single isolated population can be used for the total population if the parameter *θ* in a single population is replaced by the effective number of mutations per generation in the total population *θ*_{T}. We summarize the results in the third column of Table 1.

Barton & Slatkin [66] showed that the conditional distribution for allele proportion *x* in a subpopulation, given its proportion in the total population *y*, can be expressed as:
where *K* = 1 / *B*(4*Nm*^{*}*y* +1, 4*Nm*^{*}(1_{—}*y*)+4*Nμ*), a normalizing constant so that , and *B* is a beta function defined earlier. The unconditional proportion *x* can be obtained by integrating over all possible *y* values in the total population with distribution function given in Eq 6. Then the allele proportional distribution in a subpopulation is

Based on the above distribution, we can directly obtain the heterozygosity for a subpopulation as

(7B)This formula (Eq 7B) was derived in Maruyama [67], using a recurrence formula for heterozygosity in the total and in a subpopulation; see Rousset [68] for a review. Our approach here is a direct method based on the allele proportion distribution.

Based on the distribution in Eq 7A, the exact formula for Shannon entropy for a subpopulation can be expressed as: (see S2 Appendix).

(7C)The above formula can be numerically evaluated using standard numerical integration software. Table 1 (last column) summarizes the exact formulas for expected subpopulation entropy and heterozygosity. A general approximation in terms of the digamma function is

(7D)See S2 Appendix for derivation and for more approximation formulas under various conditions to examine some analytic properties; see Discussion for some special cases.

### Multiple populations under SMM-FIM

Based on the theory of Rousset [68] under SMM-FIM, we can express the expected heterozygosities of the total population and in a subpopulation as follows:

(8A)(8B)Note that if *m* = 0 and *n* = 1, then both heterozygosities in SMM-FIM reduce to that in a single population under the model SMM. That is, in the case *m* = 0, *n* = 1, we have ^{2}*H*_{S} = ^{2}*H*_{T} = 1−1/(1+8*Nμ*)^{1/2}; see Eq 4B.

As derived in S3 Appendix, the allele proportion distribution in the total population can be written as
where *θ*_{T} = 4*N*_{T} *μ*, *α*_{T} = [(1+2*θ*_{T})^{1/2}−1]/2 and *N*_{T} is the effective total population size under SMM-FIM. We can express *N*_{T} as a formula in terms of *m* and *μ*; see below for description. Comparing the allele proportion distributions of a single isolated population and of the total population of a subdivided population, we see that both have exactly the same form, but the parameters (*α*, *θ*) in an isolated population should be replaced by (*α*_{T}, *θ*_{T}) in the subdivided population. Thus, all results in an isolated SMM are also valid for the total population with population parameters (*α*_{T}, *θ*_{T}). For example, the expected heterozygosity in the total population can be expressed as ^{2}*H*_{T} = 1−1/(1+8*N*_{T}*μ*)^{1/2}, and *θ*_{T} and *α*_{T} can be expressed as functions of heterozygosities (see S3 Appendix):

Substituting Eq 8A into Eq 8C, we can express *θ*_{T} (and thus *N*_{T}) as well as *α*_{T} in terms of *m* and *μ*. Shannon entropy has the same formula as that given in Eqs 4A and 4B, with (*α*, *θ*) replaced by (*α*_{T}, *θ*_{T}). Table 1 (with column label “Total population” for the model SMM) summarizes the formula. Note here if *α*_{T} tends to 0, then all results reduce to those under IAM. This shows the fundamental connection between IAM and SMM formulas for the total population.

In S3 Appendix, we also derive the allele proportion distribution in a subpopulation. Consider an allele with allele proportion *x* in the subpopulation given its allele proportion in the total population is *y*, and let *ϕ*(*x*|*y*) be the conditional allele frequency distribution. Applying Wright’s formula [69], we obtain the conditional steady-state allele proportion distribution in a subpopulation:
(9A)
where *K*_{S} = 1/*B*(4*Nm***y* + *α*_{S} + 1, 4*Nm**(1−*y*) + 4*Nμ*) and *α*_{S} can be expressed as a function of heterozygosities:

(It then follows from Eqs 8A and 8B that *α*_{S} can be expressed as a function of *m* and *μ*.) Thus we have the marginal allele proportion distribution in a subpopulation:

When both *α*_{T} and *α*_{S} tend to 0, the allele proportion distribution of SMM given in Eq 9B reduce to that of IAM given in Eq 7A. Based on this distribution, the expected heterozygosity of a subpopulation becomes

Also, we obtain the expected Shannon entropy for a subpopulation:

(9C)As shown in S3 Appendix, this Shannon entropy for a typical subpopulation can be approximated by:

(9D)When both *α*_{T} and *α*_{S} tend to 0, Eqs (9C) and (9D) reduce to (7C) and (7D) respectively.

### Shannon differentiation measure

Based on the heterozygosities, the commonly used measure *G*_{ST} is expressed as *G*_{ST} = (^{2}*H*_{T}−^{2}*H*_{S})/(^{2}*H*_{T}). Since the value of *G*_{ST} is constrained by ^{2}*H*_{S}, a class of unconstrained *n*-assemblage differentiation measures called 1− *C*_{qn} were derived [18,70,71]. This class of differentiation measures is independent of within-group diversity. When *q* = 2, this measure gives Jost’s genetic differentiation measure *D* [58], which is a function of heterozygosities, i.e., *D* = 1–*C*_{2n} = (^{2}*H*_{T}−^{2}*H*_{S})/[(1−1 / *n*)(1−^{2}*H*_{S})]. We can substitute the expectations for ^{2}*H*_{T} and ^{2}*H*_{S} (given in Table 1) into the formulas of *G*_{ST} and *D* to obtain the resulting measures in terms of the model parameters under IAM-FIM and SMM-FIM.

In the limit as *q* approaches unity, the differentiation measure 1− *C*_{qn} yields a function of Shannon entropies which is referred to as Shannon differentiation measure throughout the paper:

The numerator ^{1}*H*_{T}−^{1}*H*_{S} is the mutual information (*MI*). Division by log *n* standardizes *MI* onto the unit interval if the *n* subpopulations are equally weighted. In the special case of two subpopulations, Shannon differentiation reduces to Horn’s [17] heterogeneity measure in ecology. Substituting the formulas ^{1}*H*_{T} and ^{1}*H*_{S} (given in Table 1) into the formula for *MI*, we obtain the Shannon differentiation formulas for IAM-FIM and SMM-FIM. Although the *MI* formulas in both models look complicated, we have provided some simplified formulas for IAM-FIM under some circumstances as summarized below (see Table B in S2 Appendix):

- When 4
*Nm*^{*}>> 4*Nnμ>>*0,*MI*is approximated by a simple function of 4*Nnμ*,*G*_{ST}and Jost’s*D*(Eq. B5 in S2 Appendix), revealing that both 4*N*(*m*^{*}+*μ*) (the main factor which determines*G*_{ST}) and*m*^{*}/(*nμ*) (the main factor which determines Jost’s*D*) affect Shannon differentiation. If the number of mutations is large enough, the ratio*m*^{*}/(*nμ*) becomes the dominating factor (see Discussion). Here*m*^{*}/(*nμ*) =*m*/[(*n*–1)*μ*] is the familiar scaled immigration rate [72]. - In the case in which 4
*Nm*^{*}>> 4*Nnμ*and 4*Nnμ*is small,*MI*is a simple function of 4*Nnμ*and Jost’s*D*(Eq. B6 in S2 Appendix). In the extreme case that 4*Nnμ*tends to 0,*MI*approaches 0 and thus Shannon differentiation in this extreme case approaches 0. - In the opposite case in which 4
*Nnμ>>*4*Nm*^{*},*MI*is a simple function of*m*^{*}/(*nμ*) (Eq. B7 in S2 Appendix). When*m*^{*}/(*nμ*) tends to 0,*MI*approaches log(*n*) and Shannon differentiation approaches unity.

We plot the performances of *G*_{ST}, Shannon differentiation, and Jost’s *D* under IAM-FIM (Fig 1) and SMM-FIM (Fig 2) as functions of *Nm* (the average number of dispersals per generation), *Nμ* (the average number of mutations per generation) and *m*^{*}/(*nμ*) (the balance between pairwise dispersal and mutation). The Shannon differentiation measure and Jost’s *D* always exhibit consistent patterns. For both mutation models, the two measures are increasing functions of *Nμ*, and decreasing functions of *Nm* and of *m*^{*}/(*nμ*). Although the classic *G*_{ST} measure is also decreasing in *Nm* and in *m*^{*}/(*nμ*), *G*_{ST} exhibits a strikingly different pattern being a generally decreasing or stable function of the number of mutations. In the center row of Figs 1 and 2, for Shannon and Jost’s measures: mutation-driven differentiation is more effective when there is low dispersal. In contrast, *G*_{ST} is either insensitive to mutation, or at very low mutation rates, the level of differentiation is set by dispersal (compare the three panels of the centre row).

Plots of the Shannon differentiation (i.e., normalized mutual information, solid lines), Jost’s differentiation measure *D* (dashed lines), and *G*_{ST} (dash-dotted line) as a function of *Nm* (upper panels), *Nμ* (middle panels), and *m*^{*}/(*nμ*) (lower panels).

Plots of the Shannon differentiation (i.e., normalized mutual information, solid lines), Jost’s differentiation measure *D* (dashed lines), and *G*_{ST} (dash-dotted line) as a function of *Nm* (upper panels), *Nμ* (middle panels), and *m*^{*}/(*nμ*) (lower panels).

In fact, when *m* > *μ* as in the case of middle right panel in Figs 1 and 2, *G*_{ST} becomes nearly independent of *Nμ*, unlike the other two measures. Under IAM-FIM, there is also a dramatic contrast between *G*_{ST} and the two measures when 4*Nnμ>>*4*Nm*^{*}*→* 0 (as the case in the middle left panel of Fig 1). In this case, *MI* approaches log *n*, and thus Shannon differentiation approaches 1, and Jost’s *D* also approaches 1. However, *G*_{ST} values are very low and tend to 0 as *Nμ* becomes large. See S2 Appendix for more analytic formulas for mutual information under IAM-FIM.

## Simulation

We did simulations to test the robustness of our predicted relationship between Shannon measures and heterozygosity-based measures. Under IAM, the relationship for an isolated population is given by Eq 3A, ^{1}*H* = ψ[1/(1−^{2}*H*)]+0.5772; this is also valid for the total population under IAM-FIM. Under SMM, Shannon entropy and heterozygosity for an isolated population are linked through the equation ^{1}*H* ≈ log{[1+^{2}*H*−(^{2}*H*)^{2}]/(1−^{2}*H*)} (Eq 5B), which holds for the total population under SMM-FIM (Table 1). For a subpopulation, the expected Shannon entropy is a function of not only the expected subpopulation heterozygosity but also the expected total-population heterozygosity. Under IAM-FIM, we propose an explicit link in terms of an integral involving ^{2}*H*_{T} and ^{2}*H*_{S} (Eq. D7 of S4 Appendix). Under SMM-FIM, the link is not explicit, so numerical procedures are needed; see S4 Appendix for details.

We used simulations to calculate Shannon entropy in two ways: directly from the simulated allelic data, and predicted from heterozygosity via the equations for FIM, as described in the preceding paragraph and also noted in the figure caption. Representative outputs are presented to show that the simulated curve and the curve predicted from heterozygosity for the total population (Fig 3A) and for a subpopulation (Fig 3B) under IAM-FIM. The corresponding plots for SMM-FIM are shown in Fig 3C and 3D. These simulations results were averaged over 5 loci, which each start out fixed for a single allele. Our simulation results showed that the Shannon entropy curve predicted from the heterozygosity values for IAM-FIM is slightly lower than the simulated line, but the two lines become very close even before equilibrium is reached, revealing that the relationship is also approximately valid before the equilibrium is attained. For SMM-FIM, the simulated curve and the curve predicted from heterozygosity match very closely, and almost overlap starting from the initial stages when the initial population is fixed for a single allele, shared by all subpopulations.

Simulation results showing stochastic behavior of the average (over 5 loci) of total-population and subpopulation Shannon entropies for *N* = 10000, *n* = 4, *μ* = 0.005%, *m* = 0.1% in the simulation. The horizontal line in each panel represents the theoretical equilibrium value. The initial condition was set to be just one allele (all shared) in each subpopulation. (a) The stochastic pattern for the total-population entropy ^{1}*H*_{T} is shown in black curve, and the red curve is ^{1}*H*_{T} = *ψ*[1/(1−^{2}*H*_{T})]+0.5772, which is the ^{1}*H*_{T} value calculated from a function of heterozygosity under IAM-FIM. (b) The pattern for subpopulation entropy ^{1}*H*_{S} is shown in black curve, and the red curve is obtained via a link from heterozygosity (see Eq. D7 in S4 Appendix) under IAM-FIM. In both (a) and (b), the processes converge roughly after 40000 generations, but the two lines become close before equilibrium (around 20000 generations). (c) The stochastic pattern for total- population entropy ^{1}*H*_{T} under SMM-FIM is shown in black curve, and the red curve is log{[1+^{2}*H*_{T}−(^{2}*H*_{T})^{2}]/(1−^{2}*H*_{T})}, which is the ^{1}*H*_{T} value calculated from a function of heterozygosity. (d) The pattern for subpopulation entropy ^{1}*H*_{S} is shown in black curve, and the red curve is obtained via a link from heterozygosity (see S4 Appendix for the link). The relationship between heterozygosity and Shannon entropy exists in all stages of the stochastic process under SMM-FIM.

## Empirical Test

Simulations in the preceding section show that our predicted relationship between Shannon entropy and heterozygosity is approximately valid under some non-equilibrium conditions; and for SMM the relationship is valid even in nearly all stages of the process. By examining real populations of various ages, we can test the robustness of these relationships in practice. Starlings were introduced to south-eastern Australia in the mid-19th century [73–76] and provide a good test case, having several populations of different ages. Since the 1970s, starlings have begun to invade Western Australia [77] and have been intensively controlled since that time.

Rollins [74,75] used genetic markers to trace the possible invasion pathways. Using starlings captured in 17 localities throughout their Australian range, four genetically distinct starling subpopulations were identified and their localities are shown in the footnotes of Table A (S5 Appendix), and are numbered 1−4 in order from west to east. Subpopulations 1 and 2 are the youngest, being established approximately 5 and 35 years (respectively) before the time of sampling, while subpopulations 3 and 4 are older, having been established in the 19^{th} century. Since generation time is about three years [74], the subpopulations cannot be in equilibrium or near equilibrium, especially the two youngest populations, 1 and 2.

We consider two types of data, which have different expected models [74]. (1) A locus which is expected to follow the IAM: Dopamine receptor D4 (*DRD4*) allele frequency data for the four subpopulations (Table A of S5 Appendix). (2) Three loci which are expected to follow the SMM: microsatellite data for 3 loci for the four subpopulations (Table B of S5 Appendix) [75]. While we expect these microsatellites to be selectively neutral, there is some evidence of selection on *DRD4* in other avian taxa [78,79]. Rollins [74] explicitly tested the *DRD* data used here for departures from neutrality (Tajima’s *D*, Fu’s *F*) and found no evidence of selection at this locus in the starlings included in our analysis. The graded series of Australian starling populations of different known ages provided us with the possibility of investigating approach to equilibrium, and robustness to non-equilibrium situations [80].

Based on allele frequencies (*DRD4* and microsatellite data), statistical estimation techniques are applied to obtain bias-corrected estimates of heterozygosity, Shannon entropy, Shannon differentiation and other parameters [53,81]; these estimates are referred to as “empirical” (or “estimated”) values in tables and the following discussions. The bias-correction is necessary because parameters/measures based directly on observed frequencies are biased. All the statistical estimation method for calculating the empirical values from sample data is summarized in S4 Appendix. The procedures to obtain the expected values under different models (IAM, SMM, IAM-FIM, and SMM-FIM) are summarized in S4 Appendix, and also briefly described below.

### DRD4 data

Using the *DRD4* data, we did two independent analyses. (a) We performed analysis under IAM by treating each of the four subpopulations as completely isolated from each other; all results are summarized in Table 2 and described below. (b) We treated the four populations as partially-connected subpopulations under IAM-FIM; all results and comparisons are summarized in Table 3.

**(a) Treating each of the four subpopulations as an isolated population following IAM for mutation (Table 2).** The sample sizes for *DRD4* data from subpopulations 1−4 were 146, 52, 486 and 176 respectively, revealing 16, 11, 31 and 25 alleles, a total of 38 different alleles over all subpopulations (S5 Appendix). Table 2 gives the empirical Shannon entropy values along with estimated s.e. (to quantify sampling errors) from subpopulation 1 to subpopulation 4. The empirical Shannon entropies are = 2.0539 (s.e. 0.0952), 2.2415 (s.e. 0.1139), 2.6845 (s.e. 0.0460), and 2.7638 (s.e. 0.0815) respectively, which shows an increasing pattern from west to east, consistent with the history of invasion. In our analysis, all s.e. estimates were obtained by a bootstrap method based on 1000 resamples generated from the observed allele frequency distribution. Table 2 also gives the empirical heterozygosity values and s.e from subpopulations 1−4, based on unbiased estimation theory (see S4 Appendix).

The IAM expected values in Table 2 are obtained via the relationship (Eq 3A) ^{1}*H* = *ψ*[1/(1−^{2}*H*)]+0.5772 under IAM within each subpopulation using the assumptions of equilibrium and complete isolation. Both of these assumptions are likely to be violated by the starling subpopulations. Nevertheless, our relationship still accurately predicts their observed entropies (except for subpopulation 2 due to relatively low sample size). The proportional differences (PD) between observed and predicted entropy values for subpopulations 1−4 are respectively 1.88%, 11.79%, 5.26% and 0.45%. Except for subpopulation 2 (in which sample size is relatively low and thus the s.e. of is relatively high), this relationship therefore appears to be robust for IAM loci in real populations, even if they are far from equilibrium and even if they are not completely isolated. Here the bootstrap method can take into account model uncertainty in the estimation procedures. Thus the uncertainty in estimating heterozygosity was incorporated in our estimated error of the expected Shannon entropies. Note that the s.e. for the expected Shannon entropy (via estimated heterozygosity under FIM assumptions and under equilibrium status) in each case is higher than s.e. of the estimated Shannon entropy (based on data only) due to the propagation effect of model uncertainty on the expected Shannon entropies. This is also valid in nearly all cases in the following discussions.

**(b) Assuming IAM-FIM for the four subpopulations (Table 3).** Under IAM-FIM, Table 3 first gives the empirical results of Shannon entropy and heterozygosity for the total population, subpopulation (the mean of the empirical subpopulation Shannon entropies) and three related differentiation measures: Shannon’s differentiation, Jost’s *D* and *G*_{ST}. See S4 Appendix for details. The difference between the empirical Shannon entropy for the total population (2.7444) and subpopulation (2.4359) is the empirical mutual information. Thus, it follows from Eq 10 that the estimated Shannon differentiation is (2.7444–2.4359) / log4 = 0.2225. Based on the empirical heterozygosities, Jost’s differentiation measure *D* is estimated to be 0.4407, while *G*_{ST} is much lower (0.0485).

For the IAM-FIM expected values, the link between heterozygosity and Shannon entropy for an isolated population (Eq 3A) can also be applied to the total population. This gives an expected total-population entropy of 2.9466, with PD of 6.86% when it is compared with the empirical total-population entropy. The expected subpopulation entropy was computed from the total and subpopulation heterozygosities; see Eq. D7 of S4 Appendix for details. Although the model may be wrong and the equilibrium is unlikely to have been attained, the total-population and subpopulation Shannon entropies are still predicted from heterozygosities with very high relative accuracy (PD = 6.86% for the total-population entropy and—1.84% for subpopulation). However, due to over-prediction for ^{1}*H*_{T} (positive PD) and under-prediction for ^{1}*H*_{S} (negative PD), the Shannon differentiation calculated is subjected to relatively large PD (44.4%) for these data. The large PD could derive from various departures, from the model, such as selection (although there is no evidence of differential fitness of the *DRD4* genotypes [74]) and the discrepancy between heterozygosity and Shannon entropy (see Fig 3A and 3B), or from stochasticity, heightened by the availability of only a single IAM locus, which is discussed further below, in comparison to the SMM results.

### Microsatellite data

As with the *DRD4* data, we performed two independent analyses based on the allele frequencies for the three microsatellite loci (Locus Sta213, Locus Sta294, and Locus Sta308). (a) We first estimated parameters under SMM separately for each locus by treating each of the four subpopulations as isolated from each other. (b) We treated the four populations as partially-connected subpopulations under under SMM-FIM separately for each locus for the four divided subpopulations. The average results for the three loci for the two studies are shown respectively in Tables 4 and 5. (The results for each locus are provided in S5 Appendix.)

**(a) Treating each of the four subpopulations as an isolated population following SMM for mutation (Table 4).** For the three microsatellite loci (Locus Sta213, Locus Sta294, and Locus Sta308), the sample sizes for subpopulations 1–4 are 296, 76, 620 and 274 respectively (except that the sample size for Locus Sta294 in subpopulation 3 is 616). The average numbers of alleles for the four subpopulations are respectively 8.33 (average of 9, 6, 10 for the three loci), 7.66 (9, 7, 7), 10.33 (13, 7, 11) and 11.67 (14, 7, 14); see S5 Appendix for data details. The empirical values tabulated in Tables 4 and 5 were obtained by applying the same methods described for *DRD4* data; see S4 Appendix for formulas.

Table 4 shows that the average of the empirical Shannon entropy values (over 3 loci) from subpopulation 1–4 are respectively = 1.6115 (s.e. 0.0227), 1.7696 (s.e. 0.0484), 2.0344 (s.e. 0.0142), 2.1313 (s.e. 0.0215), revealing the expected increase with subpopulation age from west to east. Again, the s.e. of the estimated Shannon entropy in subpopulation 2 is higher than those in the other three areas due to relatively low sample size in subpopulation 2. The corresponding empirical heterozygosity values from subpopulations 1–4 also exhibit an increasing trend from west to east, as expected from invasion history.

If these microsatellites follow the single-phase isolated SMM within each subpopulation, Shannon entropy should be related to heterozygosity through the equation (Eq 5B): ^{1}*H* ≈ log{[1+^{2}*H*−(^{2}*H*)^{2}]/(1−^{2}*H*)}. Table 4 shows that the average PD values (over 3 loci) between the entropies predicted from heterozygosity and the empirical entropies are 0.65%, −1.71%, −0.36% and −1.07%. The predicted values match the empirical values very closely, even for the youngest populations. Thus, as we have demonstrated in Fig 3C and 3D, for loci that obey single-phase SMM, the relationship between Shannon entropy and heterozygosity applies even to non-equilibrium populations and even if they are not completely isolated, in agreement with our simulation results.

**(b) Assuming SMM-FIM for the four subpopulations (Table 5).** Table 5 gives the average of the empirical results for the total population, subpopulation and three differentiation measures, based on the same methods described for Table 3. As in *DRD4* data, the empirical *G*_{ST} (0.0512) is much lower than the Shannon’s differentiation value (0.1501) and Jost’s *D* (0.2983). Applying the same link between heterozygosity and entropy for an isolated population to the total population, we obtain the SMM-FIM expected value of 2.0707 for the total-population entropy, which is very close to the empirical value of 2.0948 (PD = -1.17%). The mean within-subpopulation entropy is also very accurately predicted (PD = -0.50%) from our SMM-FIM theory given in S4 Appendix, even though nearly all the assumptions of the FIM model may not be satisfied, as in this starling population (which is far from equilibrium, with unequal subpopulation sizes, variable number of subpopulations through time, and spatially non-homogeneous migration). The expected Shannon differentiation value is 0.1395, which agrees well with the empirical value of 0.1501 with PD = -7.62%. This good performance compared to the Shannon differentiation for IAM (Table 3, with PD -44.4%) may be simply due to the averaging over three loci in the SMM case (Table 5). This can be seen by the improvement in performance relative to cases where each is analysed separately. The results for each locus are shown in S5 Appendix (Tables C-E), where the PDs for Shannon differentiation are -18.5%, 16.8% and -15.4% for the three loci. We also note that, according to our simulations, the link between heterozygosity and Shannon entropy under SMM-FIM (Fig 3C and 3D) is very robust and valid in nearly all stages. The link applies even in populations that violate two conditions: being far from equilibrium, and being connected by some dispersal.

## Conclusions and Discussion

Geneticists have long known that in an isolated population at equilibrium, the heterozygosity at a neutral locus in equilibrium under IAM is a simple function of the fundamental biodiversity parameter *θ* (= 4*Nμ*). Here we have shown that for neutral alleles in equilibrium, Shannon entropy is also a simple function of *θ* (see Eqs 2A and 2B). It follows that Shannon entropy is also a simple function of heterozygosity (Eq 3A). This provides a novel test for neutrality: if the observed entropy is significantly different from the entropy predicted on the basis of the observed heterozygosity, then the locus violates the assumptions of the neutral model or the IAM mutation model. We have also shown in an isolated population at equilibrium under a single-phase SMM that Shannon entropy is a simple function of *θ* (Eq 5A), and a simple link between heterozygosity and Shannon entropy also exists (Eq 5B). Then a similar test for neutrality for SMM is also provided. All theory for IAM and SMM is valid not only for isolated population but also for the total population under FIM by replacing *θ* with *θ*_{T} (= 4*N*_{T} *μ*) where *N*_{T} denotes the effective size of the total population a finite island model; see Table 1 for a summary.

In Fig 3, we have demonstrated for partially-connected subpopulations under IAM-FIM and SMM-FIM that our new link between entropy and heterozygosity turns out to be quite robust for neutral alleles and is satisfied even before equilibrium is attained, at least when the initial population has low diversity (as is often the case after a founding event). Our simulations and empirical data from starlings introduced to Australia both suggest the robustness of our proposed links.

In Table 1, we summarized all formulas derived in this paper for two mutation models: IAM and SMM. In this paper, we have provided a bridge between the two models. As shown in Table 1, when the parameter *α* in SMM tends to 0 for an isolated population, all formulas reduce to those for IAM. For total population, when *α*_{T} in SMM-FIM tend to 0, all formulas for SMM reduce to those for IAM-FIM. For subpopulation, when both *α*_{T} and *α*_{S} tend to 0, all formulas for SMM-FIM reduce to those for IAM-FIM. Generally, all properties of these two mutation models based on allele proportion distributions can be connected by this bridge.

We are also now able to link Shannon differentiation (normalized mutual information) to the parameters of the finite island model at equilibrium under both IAM-FIM and SMM-FIM. Shannon differentiation, like Jost’s *D*, is zero when all allele distributions are identical in each subpopulation, and is unity when the subpopulations share no alleles. Figs 1 and 2 reveal that Shannon’s differentiation is increasing with mutation rate, and decreasing with dispersal rate if all other parameters are fixed. In Table B (S2 Appendix), we tabulate the expected values of *G*_{ST}, Jost’s *D* and some simplified formulas for the mutual information under IAM with equilibrium in the FIM. The expected values of *G*_{ST} is determined by the sum of dispersal and mutation, *N*(*m*^{*}+*μ*), whereas the expected values of Jost’s *D* is determined by the scaled immigration rate [58,72], a ratio between pairwise dispersal rate and mutation rate, as expressed by the factor *m*^{*}/(*nμ*) = *m*/[(*n*–1)*μ*]. When 4*Nm*^{*} >> 4*Nnμ* >> 0 or 4*Nnμ*>> 4*Nm*^{*}, Shannon differentiation simplifies greatly, revealing the factors that control it. In the latter case, Shannon differentiation is nearly controlled by the ratio *m*^{*}/(*nμ*), like Jost’s *D*; see the last formula in Table B of S2 Appendix. In the former case, Shannon differentiation is determined by a combination of 4*Nnμ*, *G*_{ST} and *D*, or equivalently, by a combination of *N*(*m*^{*}+*μ*) and *m*^{*}/(*nμ*). However, the dependence on *N*(*m*^{*}+*μ*) is very weak when 4*Nnμ* >> 2, and thus the main thing that controls entropy differentiation is the ratio *m*^{*}/(*nμ*); see S2 Appendix for details.

In statistics, information theory, ecology, and physics, Shannon entropy has been generalized into numerous parametric families of “generalized entropies”, which vary in the weight they give to common versus rare alleles (or their analogs in other disciplines). The Tsallis or HCDT generalized entropies of order *q*, and the Rényi entropies [82], are two widely used families. Each family of generalized entropies generates a smooth curve when plotted as a function of the order parameter *q*. When *q* = 0 the generalized entropy ignores allele frequencies (it is a function only of allele number). As *q* increases, the generalized entropies are increasingly sensitive to allele frequencies. At *q* = 1 we have Shannon entropy which weighs alleles according to their population share. Moving beyond *q* = 1, the entropies increasingly emphasize the most abundant alleles. When *q* = 2 the measures use the same allele weighting as heterozygosity. This graph of generalized entropy as a function of *q* is called an “entropy spectrum”, and there is a corresponding “diversity profile” when the entropies are converted to effective number of alleles before plotting them. Either one of these curves completely characterizes a given allele proportion distribution, and carries the same information as the Ewens’ probability density function [55]. In S1 Appendix, we have provided the theoretical expressions for these entropy spectra or diversity profiles in terms of model parameters, under IAM and SMM. This provides a new way of characterizing the neutral equilibrium allele proportion distribution.

Fig 3 shows that it takes tens of thousands of generations to reach equilibrium in the scenarios we considered. However, the starling data (Tables 2–5) show that the methods appear to be quite robust to all but the most extreme deviations from equilibrium in the newest western populations (Subpopulation 2 with relatively sparse data), although even the oldest of the populations was established only of the order of a hundred generations ago. It is encouraging there is generally good fit to real biological data, even with only small numbers of loci, and various known deviations from the theoretical model listed above, although there is better fit when there is averaging over more than one locus, and more time allowed for equilibration (Tables 2–5).

We summarize the major comparisons between the measures based on the traditional heterozygosity and our proposed measures based on Shannon entropy below. This summary also reveals the limitations of each approach.

- As discussed, heterozygosity and Shannon entropy each contain useful but partial information about an allele frequency distribution. These two measures, along with allele numbers [64,83, p. 263], are the three most informative special cases of a complete profile of the Tsallis entropies or the Rényi entropies. Measures based on the traditional heterozygosity disproportionately favor the frequent alleles whereas measures based on Shannon entropy weigh alleles in proportion to their frequencies.
- Both the heterozygosity and Shannon entropy and their differentiation measures can be linked to neutral genetic models under equilibrium, e.g., IAM and SMM for an isolated population, and IAM-FIM and SMM-FIM for subdivided populations. These formulas are shown in Table 1. Our formulas based on Shannon entropy in Table 1 for an isolated population are at least as simple as those based on heterozygosity. Although our formulas for subdivided populations and mutual information look more complicated, all can be numerically evaluated using standard software.
- Under IAM-FIM, the measures
*G*_{ST}and Jost’s*D*, or equivalently^{2}*H*_{T}and^{2}*H*_{S}, can be jointly used to obtain analytic estimates of dispersal rate and mutation rate based on estimated heterozygosities (see Eqs. D5 and D6 in S4 Appendix for the estimation formulas and the footnotes of Table 3 for their estimates as applied to the starling data). Under SMM-FIM, numerical method is required to obtain estimates of dispersal rate and mutation rate (see Eqs. D8 and D9 in S4 Appendix and the footnotes of Table 5 for their estimates as applied to the starling data). However, for measures based on Shannon entropy, currently it is not feasible to obtain analytic or numerical estimates of dispersal rate and mutation rate unless empirical equations are adopted [12,13]. - The Shannon differentiation measure
*C*_{1n}based on the between-group component of entropy, obeys stronger monotonicity properties than the*G*_{ST}and Jost’s*D*based on the between-group component of heterozygosity. A monotonicity property in Jost et al. [70] implies that Shannon’s differentiation measure always increases any time a new allele is added to any subpopulation, with any abundance, whereas*G*_{ST}and Jost’s*D*do not satisfy this property. In S6 Appendix, we further prove that if some copies of an allele that is shared among subpopulations are replaced by copies of unshared alleles, Shannon differentiation measure always increases. We also give a counter-example to show that*G*_{ST}and Jost’s*D*do not satisfy this requirement. These monotonicity properties reveal that the Shannon differentiation measure has some good properties that are lacking for measures based on heterozygosity, and these properties may better capture the meaning of differentiation in many contexts, including conservation. - The measure
*G*_{ST}in FIM converges very quickly in the genetic stochastic processes whereas the normalized mutual information based on Shannon entropy converges relatively slowly. This is expected because the maximum possible value of*G*_{ST}is constrained by the subpopulation heterozygosity and thus takes values in a very narrow range, whereas the value of the normalized mutual information is not constrained a priori and thus potentially spans the full range [0, 1] no matter what the value of subpopulation entropy. - Estimators of Shannon entropy or heterozygosity should be used, instead of calculating their observed values directly from the sample allele frequencies. From the perspective of statistical inference, measures based on heterozygosity can be accurately estimated from incomplete samples nearly without any bias because these measures focus on the frequent alleles, which always appear in samples. However, it is surprisingly non-trivial to make accurate estimates of population entropy based on small samples; it can be proven that no unbiased estimator exists [84]. Recently Chao et al. developed a low-bias entropy estimator [53]. See S4 Appendix for statistical estimation.

In conclusion, the theoretical advances presented here, combined with the estimation theory [53], should entice geneticists to add Shannon entropy to their genetic toolkit, and to develop connections between the entropy of allele proportion distributions, the entropy of gene sequences, the mutual information between gene regions, and other information-theoretical properties of genes. The R scripts for computing all measures discussed in this paper are available in S7 Appendix with comments.

## Supporting Information

### S1 Appendix. Derivation of the equilibrium expectation of Shannon entropy under IAM and SMM for an isolated population.

https://doi.org/10.1371/journal.pone.0125471.s001

(PDF)

### S2 Appendix. Derivation of the equilibrium expectation of total-population and subpopulation Shannon entropy under IAM-FIM.

https://doi.org/10.1371/journal.pone.0125471.s002

(PDF)

### S3 Appendix. Derivation of the equilibrium expectation of total-population and subpopulation Shannon entropy under SMM-FIM.

https://doi.org/10.1371/journal.pone.0125471.s003

(PDF)

### S5 Appendix. Dopamine receptor D4 (*DRD4*) and microsatellite data (3 loci) of four starling populations (Group 1-Group 4).

https://doi.org/10.1371/journal.pone.0125471.s005

(PDF)

### S6 Appendix. Two strong monotonicity properties for mutual information and Shannon differentiation measure.

https://doi.org/10.1371/journal.pone.0125471.s006

(PDF)

### S7 Appendix. R scripts for computing all measures discussed in this paper.

https://doi.org/10.1371/journal.pone.0125471.s007

(TXT)

## Acknowledgments

The authors thank the Academic Editor (Mark McDonnell), Peter Smouse and three anonymous reviewers for thoughtful and helpful comments and suggestions. This paper was initiated in July 2012 when AC, LJ and WBS attended the research program on “Mathematics of Biodiversity” and the “Exploratory Conference on the Mathematics of Biodiversity” organized by the Centre de Recerca Mathemàtica (CRM), Barcelona, Spain. AC, LJ and WBS thank CRM for invitation and Tom Leinster and colleagues for coordinating the program and conference. We thank Michael Whitehead for laboratory assistance.

## Author Contributions

Conceived and designed the experiments: AC LJ WBS. Performed the experiments: TCH KHM LAR. Analyzed the data: AC LJ TCH KHM WBS LAR. Contributed reagents/materials/analysis tools: TCH KHM LAR. Wrote the paper: AC LJ TCH KHM WBS LAR.

## References

- 1. Wright S. Evolution in Mendelian populations. Genetics. 1931; 16: 97–159. pmid:17246615
- 2.
Crow JF, Kimura M. An introduction to population genetics theory. New York: Harper and Row Publishers; 1970.
- 3.
Roussett F. Genetic structure and selection in subdivided populations. Princeton: Princeton University Press; 2004.
- 4.
Hedrick PW. Genetics of populations. 3rd ed. Sudbury, MA: Jones and Bartlett Publishers; 2005.
- 5.
Aczél J, Daróczy Z. On measures of information and their characterizations. New York: Academic Press; 1975.
- 6. Tsallis C, Brigatti E. Nonextensive statistical mechanics: A brief introduction. Continuum Mech Therm. 2004; 16: 223–235.
- 7. Keylock CJ. Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy. Oikos. 2005; 109: 203–207.
- 8. Jost L. Partitioning diversity into independent alpha and beta components. Ecology. 2007; 88: 2427–2439. pmid:18027744
- 9. Ellison AM. Partitioning diversity. Ecology. 2010; 91: 1962–1963. pmid:20715615
- 10. Chao A, Chiu C-H, Jost L. Unifying species diversity, phylogenetic diversity, functional diversity and related similarity and differentiation measures through Hill numbers. Annu Rev Ecol Evol Syst. 2014; 45: 297–324.
- 11. Shannon CE. A mathematical theory of communication. AT&T Tech J. 1948; 27: 379–423 and 623–656.
- 12. Sherwin WB, Jabot F, Rush R, Rossetto M. Measurement of biological information with applications from genes to landscapes. Mol Ecol. 2006; 15: 2857–2869. pmid:16911206
- 13. Sherwin WB. Entropy and information approaches to genetic diversity and its expression: Genomic geography. Entropy. 2010; 12: 1765–1798.
- 14. Dewar RC, Sherwin WB, Thomas E, Holleley CE, Nichols RA. Predictions of single‐nucleotide polymorphism differentiation between two populations in terms of mutual information. Mol Ecol. 2011; 20: 3156–3166. pmid:21736655
- 15. Buddle CM, Beguin J, Bolduc E, Mercado A, Sackett TE, Selby RD, et al. The importance and use of taxon sampling curves for comparative biodiversity research with forest arthropod assemblages. Can Entomol. 2005; 137: 120–127.
- 16.
Jost L, Chao A, Chazdon RL. Compositional similarity and β (beta) diversity. In: Magurran AE, Mc Gill BJ, editors. Biological diversity: frontiers in measurement and assessment. Oxford: Oxford University Press; 2011. pp. 66–84.
- 17. Horn HS. Measurement of "overlap" in comparative ecological studies. Am Nat. 1966; 100: 419–424.
- 18. Chao A, Jost L, Chiang SC, Jiang YH, Chazdon RL. A Two-Stage Probabilistic Approach to Multiple-Community Similarity Indices. Biometrics. 2008; 64: 1178–1186. pmid:18355386
- 19. MacArthur RH. Patterns of species diversity. Biol Rev. 1965; 40: 510–533.
- 20. Lewontin RC. The apportionment of human diversity. Evol Biol. 1972; 6: 381–398.
- 21. Xia Z, Jin G, Zhu J, Zhou R. Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus. Bioinformatics. 2009; 25: 2309–2317. pmid:19706746
- 22. Swati D. In silico comparison of bacterial strains using mutual information. J Biosci. 2007; 32: 1169–1184. pmid:17954978
- 23. Schall JJ, St Denis KM. Microsatellite loci over a thirty-three year period for a malaria parasite (Plasmodium mexicanum): Bottleneck in effective population size and effect on allele frequencies. Parasitology. 2013; 140: 21–28. pmid:22948096
- 24. Karlin EF, Andrus RE, Boles SB, Shaw AJ. One haploid parent contributes 100% of the gene pool for a widespread species in northwest North America. Mol Ecol. 2011; 20: 753–767. pmid:21199037
- 25. Rossetto M, Kooyman R, Sherwin W, Jones R. Dispersal limitations, rather than bottlenecks or habitat specificity, can restrict the distribution of rare and endemic rainforest trees. Am J Bot. 2008; 95: 321–329. pmid:21632357
- 26. Rossetto M, Thurlby KAG, Offord CA, Allen CB, Weston PH. The impact of distance and a shifting temperature gradient on genetic connectivity across a heterogeneous landscape. BMC Evol Biol. 2011; 11: 126. pmid:21586178
- 27. Mellick R, Lowe A, Rossetto M. Consequences of long-and short-term fragmentation on the genetic diversity and differentiation of a late successional rainforest conifer. Aust J Bot. 2011; 59: 351–362.
- 28. Shapcott A, Powell M. Demographic structure, genetic diversity and habitat distribution of the endangered, Australian rainforest tree Macadamia jansenii help facilitate an introduction program. Aust J Bot. 2011; 59: 215–225.
- 29. Rivers MC, Brummitt NA, Lughadha EN, Meagher TR. Genetic variation in Delonix sl (Leguminosae) in Madagascar revealed by AFLPs: fragmentation, conservation status and taxonomy. Conserv Genet. 2011; 12: 1333–1344.
- 30. Andrew RL, Ostevik KL, Ebert DP, Rieseberg LH. Adaptation with gene flow across the landscape in a dune sunflower. Mol Ecol. 2012; 21: 2078–2091. pmid:22429200
- 31. Chen S, Wan Z, Nelson MN, Chauhan JS, Redden R, Burton WA, et al. Evidence from Genome-wide simple sequence repeat markers for a polyphyletic origin and secondary centers of genetic diversity of Brassica juncea in China and India. J Hered. 2013; 104: 416–427. pmid:23519868
- 32. Gailing O, Hickey E, Lilleskov E, Szlavecz K, Richter K, Potthoff M. Genetic comparisons between North American and European populations of Lumbricus terrestris L. Biochem Syst Ecol. 2012; 45: 23–30.
- 33. Allen B, Kon M, Bar‐Yam Y. A new phylogenetic diversity measure generalizing the Shannon index and its application to phyllostomid bats. Am Nat. 2009; 174: 236–243. pmid:19548837
- 34. Blum MJ, Bagley MJ, Walters DM, Jackson SA, Daniel FB, Chaloud DJ, et al. Genetic diversity and species diversity of stream fishes covary across a land-use gradient. Oecologia. 2012; 168: 83–95. pmid:21833642
- 35. Niederstätter H, Rampl G, Erhart D, Pitterl F, Oberacher H, Neuhuber F, et al. Pasture names with Romance and Slavic roots facilitate dissection of Y chromosome variation in an exclusively German-speaking alpine region. PLoS ONE. 2012; 7: e41885. pmid:22848647
- 36. Zhang J. Modeling multi-species interacting ecosystem by a simple equation. Int Joint Conf Comput Sci Optim. 2009; 1: 1003–1007.
- 37. Priness I, Maimon O, Ben-Gal I. Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics. 2007; 8: 111. pmid:17397530
- 38. Meyer PE, Lafitte F, Bontempi G. minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics. 2008; 9: 461. pmid:18959772
- 39. Ribeiro AS, Kauffman SA, Lloyd-Price J, Samuelsson B, Socolar JE. Mutual information in random Boolean models of regulatory networks. Phys Rev E. 2008; 77: 011901. pmid:18351870
- 40. Schwanz LE, Proulx SR. Mutual information reveals variation in temperature-dependent sex determination in response to environmental fluctuation, lifespan and selection. Proc R Soc B. 2008; 275: 2441–2448. pmid:18647722
- 41. Chanda P, Sucheston L, Liu S, Zhang A, Ramanathan M. Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits. BMC Genomics. 2009; 10: 509. pmid:19889230
- 42. Wu X, Jin L, Xiong M. Mutual information for testing gene-environment interaction. PLoS ONE. 2009; 4: e4578. pmid:19238204
- 43. Brunel H, Gallardo-Chacón J-J, Buil A, Vallverdú M, Soria JM, Caminal P, et al. MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis. Bioinformatics. 2010; 26: 1811–1818. pmid:20562420
- 44. Yuan X, Zhang J, Wang Y. Mutual information and linkage disequilibrium based SNP association study by grouping case-control. Genes Genomics. 2011; 33: 65–73.
- 45. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008; 24: 333–340. pmid:18057019
- 46.
Kitchovitch S, Song Y, van der Wath R, Liò P. Substitution matrices and mutual information approaches to modeling evolution. Learning and Intelligent Optimization: Springer; 2009. pp. 259–272.
- 47. Penner O, Grassberger P, Paczuski M. Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies. PLoS ONE. 2011; 6: e14373. pmid:21245917
- 48. Shlush LI, Bercovici S, Wasser WG, Yudkovsky G, Templeton A, Geiger D, et al. Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers. BMC Med Genomics. 2010; 3: 47. pmid:20955568
- 49. Zhang L, Liu J, Deng H-W. A multilocus linkage disequilibrium measure based on mutual information theory and its applications. Genetica. 2009; 137: 355–364. pmid:19707879
- 50.
Smith RD. Information theory and population genetics; 2011. arXiv Preprint. arXiv:11035625.
- 51. Ricotta C, Moretti M. Quantifying functional diversity with graph-theoretical measures: advantages and pitfalls. Community Ecol. 2008; 9: 11–16.
- 52. Bulit C, Díaz-Ávalos C, Montagnes DJ. Scaling patterns of plankton diversity: a study of ciliates in a tropical coastal lagoon. Hydrobiologia. 2009; 624: 29–44.
- 53. Chao A, Wang YT, Jost L. Entropy and the species accumulation curve: a novel entropy estimator via discovery rates of new species. Methods Ecol Evol. 2013; 4: 1091–1100.
- 54. Cadotte MW, Davies TJ, Regetz J, Kembel SW, Cleland E, Oakley TH. Phylogenetic diversity metrics for ecological communities: integrating species richness, abundance and evolutionary history. Ecol Lett. 2010; 13: 96–105. pmid:19903196
- 55. Ewens WJ. The sampling theory of selectively neutral alleles. Theor Popul Biol. 1972; 3: 87–112. pmid:4667078
- 56. Kimura M, Crow JF. The number of alleles that can be maintained in a finite population. Genetics. 1964; 49: 725–738. pmid:14156929
- 57. Hill MO. Diversity and evenness: a unifying notation and its consequences. Ecology. 1973; 54: 427–432.
- 58.
Jost L. G
_{ST}and its relatives do not measure differentiation. Mol Ecol. 2008; 17: 4015–4026. pmid:19238703 - 59. Ohta T, Kimura M. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet Res. 1973; 22: 201–204. pmid:4777279
- 60. Kimura M, Ohta T. Distribution of allelic frequencies in a finite population under stepwise production of neutral alleles. Proc Natl Acad Sci. 1975; 72: 2761–2764. pmid:1058491
- 61. Kimura M, Ohta T. Stepwise mutation model and distribution of allelic frequencies in a finite population. Proc Natl Acad Sci. 1978; 75: 2868–2872. pmid:275857
- 62. Latter BDH. The island model of population differentiation: a general solution. Genetics. 1973; 73: 147–157. pmid:4687659
- 63.
Whitlock MC, McCauley DE. Indirect measures of gene flow and migration: F
_{ST}≠ 1/(4Nm+1). Heredity. 1999; 82: 117–125. pmid:10098262 - 64.
Wright S. Evolution and the genetics of populations. Vol. 2. The theory of gene frequencies. Chicago: University of Chicago Press; 1969.
- 65. Whitlock MC, Barton N. The effective size of a subdivided population. Genetics. 1997; 146: 427–441. pmid:9136031
- 66. Barton NH, Slatkin M. A quasi-equilibrium theory of the distribution of rare alleles in a subdivided population. Heredity. 1986; 56: 409–415. pmid:3733460
- 67. Maruyama T. Effective number of alleles in a subdivided population. Theor Popul Biol. 1970; 1: 273–306. pmid:5527634
- 68. Rousset F. Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics. 1996; 142: 1357–1362. pmid:8846911
- 69. Wright S. The distribution of gene frequencies under irreversible mutation. Proc Natl Acad Sci USA. 1938; 24: 253–259. pmid:16577841
- 70. Jost L, DeVries P, Walla T, Greeney H, Chao A, Ricotta C. Partitioning diversity for conservation analyses. Divers Distrib. 2010; 16: 65–76.
- 71. Chao A, Chiu C-H, Hsieh TC. Proposing a resolution to debates on diversity partitioning. Ecology. 2012; 93: 2037–2051. pmid:23094376
- 72. Beerli P, Palczewski M. Unified framework to evaluate panmixia and migration direction among multiple sampling locations. Genetics. 2010; 185: 313–326. pmid:20176979
- 73.
Higgins SJ, Peter PJ, Cowling JM. Handbook of Australian, New Zealand and Antarctic Birds. Vol. 7. Boatbill to Starlings. Melbourne: Oxford University Press; 2006. pmid:16989664
- 74.
Rollins LA. A molecular investigation of dispersal, drift and selection to aid management of an invasion in progress. Thesis, The University of New South Wales. 2009.
- 75. Rollins LA, Woolnough AP, Wilton AN, Sinclair R, Sherwin WB. Invasive species can't cover their tracks: using microsatellites to assist management of starling (Sturnus vulgaris) populations in Western Australia. Mol Ecol. 2009; 18: 1560–1573. pmid:19317845
- 76. Rollins LA, Woolnough AP, Sinclair R, Mooney NJ, Sherwin WB. Mitochondrial DNA offers unique insights into invasion history of the common starling. Mol Ecol. 2011; 20: 2307–2317. pmid:21507095
- 77.
Woolnough AP, Massam MC, Payne RL, Pickles GS. Out on the border: keeping starlings out of Western Australia. Manaaki Whenua Press, Landcare Research; 2005. pp. 183–189.
- 78. Fidler AE, van Oers K, Drent PJ, Kuhn S, Mueller JC, Kempenaers B. Drd4 gene polymorphisms are associated with personality variation in a passerine bird. Proc R Soc Lond B Biol Sci. 2007; 274: 1685–1691.
- 79. Mueller JC, Edelaar P, Carrete M, Serrano D, Potti J, Blas J, et al. Behaviour‐related DRD4 polymorphisms in invasive bird populations. Mol Ecol. 2014; 23: 2876–2885. pmid:24750181
- 80. Wagner A. Neutralism and selectionism: a network-based reconciliation. Nat Rev Genet. 2008; 9: 965–974. pmid:18957969
- 81. Chao A, Shen T-J. Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environ Ecol Stat. 2003; 10: 429–443.
- 82.
Rényi A. On measures of entropy and information. Vol. 1. Berkeley: University of California Press; 1961. pp. 547–561.
- 83. Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Pop Biol. 1975; 7:256–276.
- 84. Blyth CR. Note on estimating information. Ann Math Stat. 1959; 30: 71–79.