A Network Model for the Correlation between Epistasis and Genomic Complexity

The study of genetic interactions (epistasis) is central to the understanding of genome organization and evolution. A general correlation between epistasis and genomic complexity has been recently shown, such that in simpler genomes epistasis is antagonistic on average (mutational effects tend to cancel each other out), whereas a transition towards synergistic epistasis occurs in more complex genomes (mutational effects strengthen each other). Here, we use a simple network model to identify basic features explaining this correlation. We show that, in small networks with multifunctional nodes, lack of redundancy, and absence of alternative pathways, epistasis is antagonistic on average. In contrast, lack of multi-functionality, high connectivity, and redundancy favor synergistic epistasis. Moreover, we confirm the previous finding that epistasis is a covariate of mutational robustness: in less robust networks it tends to be antagonistic whereas in more robust networks it tends to be synergistic. We argue that network features associated with antagonistic epistasis are typically found in simple genomes, such as those of viruses and bacteria, whereas the features associated with synergistic epistasis are more extensively exploited by higher eukaryotes.


1.
Overall, there are as many boxes as mutations.
2. The complete pattern may contain several subpatterns. There can be as many of them as different lengths of pathways are considered. Each subpattern corresponds to a different pathway length.
3. Each subpattern is an arrangement of boxes in rows and columns such as the number of boxes in a row (or column) cannot increase when we move down (or right) within it.
4. The number of columns in a given subpattern is limited to the number of different pathways in the considered network having the length associated to this subpattern.
II. The number of different cases in a given class of mutations is derived from the corresponding pattern according to the following rules.
1. The overall number associated to a pattern is obtained as the product of the numbers corresponding to the subpatterns it is made of.
2. The number corresponding to a subpattern associated to paths of length n when k pathways of this length are present is calculated as follows: (a) To each box in the first (top) row a factor n(k+1−c) is associated, where c is the column on top of which the box is found (i.e. a factor nk for the top left box, n(k − 1) for the box next to it on the right and so on). (b) To each box not in the first row is associated a factor of n. The power 1/q c appears to cancel the overcounting coming from the product being extended to all columns, for example, for three identical columns one would get (3!) 1/3 (3!) 1/3 (3!) 1/3 = 3!. In plain words, the symmetry factor has a piece coming from the permutation symmetry of boxes in each column and a piece coming from the permutation symmetry of identical columns.
III. The complete analysis for a given number of mutations is obtained by considering the contribution of all the possible allowed patterns.
1. The number of cases in each class is computed as in II.
2. The total number of possible cases with g genes and m mutations is just g m .
3. The epistasis value associated to each pattern is straightforward to compute as the number of columns directly gives the number of disabled paths. The contribution to the final epistasis value is then weighted by the number of cases.
4. Knowing the expected epistasis value from the previous item the expected variance is also straightforward to compute.

0.0482
Example 3 10 nodes, 1 path of length 8 and 1 path of length 2, 7 mutations (see Table 2). N.B. Readers having some familiarity with number theory or the symmetric group S n will recognise those box subpatterns as Ferrers graphs or Young diagrams; they are directly useful here because they represent partitions of integer numbers (in this case, partitions of the number of mutations), just what is required to have an organised and exhaustive procedure for our calculations.