Shared Information between Residues Is Sufficient to Detect Pairwise Epistasis in a Protein

In a comment on our manuscript"Strong selection significantly increases epistatic interactions in the long-term evolution of a protein", Dr. Crona challenges our assertion that shared entropy (that is, information) between two residues implies epistasis between those residues, by constructing an explicit example of three loci (say A, B, and C), where A and B are epistatically linked (leading to shared entropy between A and B), and A and C also depend epistatically (leading to shared entropy between A and C), so that loci B and C are correlated (share entropy).

Note that these values are conditional on the state of locus A, but for the case discussed here they happen to be equal and vanishing. Clearly, this is a special case. In general, the pairwise epistasis conditional on the state of another locus can depend on that state, and if there are n other binary loci, then there could be in principle 2 n different values for the pairwise epistasis. Surely this is not satisfactory, as pairwise epistasis then would not be defined. Instead, pairwise epistasis should be unconditional on the state of other loci in the genome. How do you calculate this?
We assert that pairwise epistasis between two loci should depend on the fitness effect of mutations at those loci where the states of the other loci are determined by mutation-selection balance in a population. In other words, we assert that fitness effects should be measured by the effect on the growth rate of a population. For the three locus system, the fitness of the BC system depends on the frequency of the A = 0 allele and the A = 1 allele in the population. Let p 0 stand for the frequency of the A = 0 allele, with p 0 + p 1 = 1. Then These four values can be used to calculate the epistasis between loci B and C unconditional on the state of A as We plot this quantity in Fig 1 as a function of the frequency of the A = 0 allele p 0 and see that it is everywhere positive except for p 0 = 0 or p 0 = 1, which are the conditional epistasis values of Crona (Eqs 2 and 3).
However, given the fitness landscape Eq 1, these extreme values (a population composed purely of one allelic state of A) are impossible. As long as the mutation rate is nonvanishing, there will always be a mixture of both alleles at locus A. Indeed, Table 1 of the supplementary information of [1], which tabulates an evolutionary simulation on that precise landscape, makes that point for us. Crona finds that p 0 % 0.998 in equilibrium, leading to ε BC % 0.191, which is nonvanishing. Thus, the positive shared entropy between those loci is indeed sufficient to determine nonvanishing pairwise epistasis between them. We also remark that at that frequency p 0 , the information between B and C is exceedingly small: I%0.0013.
Needless to say, the example discussed here is a fairly contrived one, and we show in Fig 9  of [2] that the correlation between epistasis and information is robust when testing random fitness landscapes. The same argument holds for examples 2 and 3 in the comment [1].
In conclusion, the assertion in [1] that detecting epistasis via shared entropy gives false positives for epistasis is based on a calculation of conditional epistasis, a concept that is ambiguous at best as it depends on the allelic state of all the other loci on the genome and could take on arbitrary values. If epistasis is calculated by averaging over the allelic state of the other loci, then our assertion that correlation (positive information) implies positive pairwise epistasis holds without exception.

Author Contributions
Conceptualization: AG CA.