Getting higher on rugged landscapes: Inversion mutations open access to fitter adaptive peaks in NK fitness landscapes

Molecular evolution is often conceptualised as adaptive walks on rugged fitness landscapes, driven by mutations and constrained by incremental fitness selection. It is well known that epistasis shapes the ruggedness of the landscape’s surface, outlining their topography (with high-fitness peaks separated by valleys of lower fitness genotypes). However, within the strong selection weak mutation (SSWM) limit, once an adaptive walk reaches a local peak, natural selection restricts passage through downstream paths and hampers any possibility of reaching higher fitness values. Here, in addition to the widely used point mutations, we introduce a minimal model of sequence inversions to simulate adaptive walks. We use the well known NK model to instantiate rugged landscapes. We show that adaptive walks can reach higher fitness values through inversion mutations, which, compared to point mutations, allows the evolutionary process to escape local fitness peaks. To elucidate the effects of this chromosomal rearrangement, we use a graph-theoretical representation of accessible mutants and show how new evolutionary paths are uncovered. The present model suggests a simple mechanistic rationale to analyse escapes from local fitness peaks in molecular evolution driven by (intragenic) structural inversions and reveals some consequences of the limits of point mutations for simulations of molecular evolution.


Introduction
The fitness landscape is a very influential metaphor introduced by Wright [1] to describe evolution as explorations through a "field of possible genes combinations", where high values of fitness are represented as peaks separated by valleys of lower fitness. The topography of the fitness landscape has important evolutionary consequences, e.g. speciation via reproductive isolation [2]. Within this framework, the evolution of any population can be conceptualised as adaptive walks driven by successive mutations constrained by incremental or neutral fitness steps. Thus, in the absence of additional evolutionary forces such as drift or environmental variations, once a population reaches a local peak, natural selection hampers any further mutational paths that decrease fitness. However, there are empirical evidences showing that populations do not stop indefinitely at a local peak and can explore alternative trajectories on the landscape [3][4][5][6][7], ergo, the following question arises: How does evolution escape from a local peak to a fitter one?
Since it has been formulated, considerable progress have been made on this question [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. However, conventional theoretical approaches still state that a genotype mutates into another through point mutations (e.g. single-nucleotide variations). If one takes a look at the molecular scale of DNA and the different mutation types, this may seem contradictory as it is well known that many other kinds of variation operators (including insertions, deletions, duplications, translocations and inversions) act on the genome. Hence, a fundamental aspect of this challenge is to understand-at the scale of molecular evolution-the roles played by these different mutation types. Indeed, there is a gap between the theoretical models, that account for a very limited set of mutations types-typically only point mutations-and the reality of molecular evolution, where multiple variation operators act on the sequence.
As a contribution to bridge this gap, we present a minimal DNA-inspired mechanistic model for inversion mutations, and explore their relationship with the escape dynamics from local fitness peaks. Inversion mutations are one of the most frequent chromosomal rearrangements [23,Ch. 17.2] with lengths covering a wide range of sizes. For example, in Long Term Evolution Experiments with E. coli, chromosomal rearrangements have been characterised by optical mapping (hence limiting the resolution to rearrangements larger than 5000 bp) [24]. In this study, 75% of evolved populations showed inversion events-ranging in size from *164 Kb to *1.8 Mb [24] (for other examples of large inversions in different clades see [25]). With the development of novel sequencing technologies [26][27][28], it has been possible to identify intragenic (submicroscopic) DNA inversions, for example, an inversion of seven nucleotides in mitochondrial DNA, resulting in the alteration of three amino acids and associated to an unusual mitochondrial disorder [29]. Intragenic inversions have also been suggested to be an important mechanism implied in the evolution of eukaryotic cells [30]. Although chromosomal inversions are ubiquitous in many evolutionary processes [23][24][25][26][27][29][30][31][32][33][34][35][36][37][38], very little is known about their theoretical description and computational simulation at the sequence level, as models generally focus on very large inversions (typically larger than a single gene), hence on their effect on synteny (or the deleterious effects at breakpoints), but neglect the possibility that small inversions occur inside coding sequences [39].
Here, we simulate a representation of molecular evolution of digital organisms (replicators), each of which contains a single piece of DNA. We engineer a computational method to cartoon the double-stranded structure of DNA, and simulate inversion-like mutations consisting of a permutation of a segment of the complementary strand, which is then exchanged with the main strand segment (see Methods, schema 6). For the sake of simplicity, we consider digital genotypes made up of binary nucleotides (i.e. a binary alphabet {0, 1} instead of the four-nucleotides alphabet {A, T, C, G}). The sequences are arranged in circular strings with constant number of base-pairs. In an abstract sense, the model mimics the molecular evolution of some viruses [40] and (animal) mitochondrial DNA [41] with compact genomes and closed doublestranded DNA circles [40,42,43]. It is very important to emphasise that our computational model simulates intragenic-like mutations [29,30,44]. We are modelling asexual replication, therefore recombination is not considered. To build rugged fitness landscapes, we adopt the well known Kauffman NK model, where N denotes the length of the genome and K parameterizes the "epistatic" coupling between nucleotides [45][46][47][48]. We do not include environmental changes, so the landscape remains constant through the simulations. Finally, it is worth mentioning that all our simulations were conducted in the evolutionary regime of strong selection weak mutations (SSWM) [49,Ch. 5].

Who is next to whom: The mutational network
We first study how inversion mutations can increase the number of accessible mutants. For this we translate the canonical notion of neighbour genotypes (see Methods, Eq 1) into a graph theory approach, and analyse the simplest and most familiar geometric object in molecular evolutionary theory: the discrete space of binary sequences (unless explicitly stated we will henceforth consider only binary alphabets). Thinking topologically, all the sequences x with N binary-nucleotides x i 2 {0, 1}, 8i = 1, . . ., N and N 2 N �2 , define the set X 2 f0; 1g N of 2 N possible genotype combinations. A canonical measure to characterise the topology of the set X is the Hamming distance A convenient way to organise such a set is by graphs connecting two sequences x and x 0 that differ by one point mutation (i.e. d H (x, x 0 ) = 1). This is the so-called Hamming graph HðN; 2Þ-a special case of the hypercube graph Q N (the well-known graph representation of the genotype space). On the other hand, the Hamming distance for inversion mutations forms a set of integers satisfying meaning that, contrary to point mutations, the Hamming distance of inversion mutations range from zero to N (see Methods). Note that if the inversion spans the entire chromosome, then d H (x, x 0 ) = N, all loci have changed, but it also implies that 5'-3' becomes 3'-5' and vice versa and nothing has changed biologically.
We propose that, for a sequence x 2 X, the mutation operation (i.e. the mechanistic representation of point or inversion mutations) build the set N n ðxÞ of accessible mutants x 0 2 N n ðxÞ; 8x 0 6 ¼ x; ) d H ðx; x 0 Þ 6 ¼ 0 (the subindex ν denotes the type of mutation: P for point mutations and I for inversion mutations). Therefore, the number of neighbouring mutants can be reformulated as D n ðxÞ ¼ jN n ðxÞj: Inversion's combinatorics is not trivial since it involves the permutation of a subsequence and its flips between each strand (see Methods, schemata 2 and 6). Nevertheless, from the algorithmic point of view, for a given genotype x the mutational operations can be used to enumerate all the accessible mutants (see Methods, Algorithm 1: Mutate). Also, the combinatorics can be represented as a directed multigraph of mutations mðxÞ; 8x 2 X (see the mathematical definition in [50, p.8]), i.e. the ordered triple m ¼ ðVðmÞ; EðmÞ; I m Þ; where VðmÞ ¼ x [ N n ðxÞ is the set of vertices (formed by a given genotype x and its mutated genotypes fx 0 g 2 X), E(m) is the set of directed edges (from genotype x to a mutated genotype x 0 ) and I m : X ! X is an incidence relation that associates to each element of E(m) an ordered pair of V(m). In fact, the incidence relation I m corresponds to a mutation operation (see Methods, Eqs (3), (4) and (5)). As an example, in Fig 1 we display the atlas of accessible-mutants for N = 4, constructed by calculating all inversion mutations for each one of the 2 4 (wild-type) sequences (central red dots). In this example (see also . We can also verify that min(D I (x)) � max(D P (x)). In Table 1 we show the enumeration of accessible-mutants for inversions and point mutations for genome sizes ranging from N = 2 to 10. From Fig 1 and Table 1, we can see that the combinatorics of the inversion mutations is not trivial. We can verify that the maximum number of accessible mutants is equal to N 2 − N + 1, which corresponds to the trivial cases of genotypes x with x i = 0, 8i 2 {0, . . .N} and x i = 1, 8i 2 {0, . . .N}. Note that for a circular sequence of size N, the total number of inversion mutations is N 2 , while for point mutations this number is equal to N. However, the number of mutants accessible by inversions is lower than the total number of inversions mutations (D I < N 2 ). This is due to "degenerate" inversion mutations: several inversions-occurring between different loci and/or for different interval sizes-may mutate the initial sequence to the same accessible mutant (see the multiple edges in Fig 1). In Fig 1, we can also verify that there are loops (an "edge" joining a vertex to itself), that is, "invariant inversions" that preserves the nucleotide sequence after the inversion operation (i.e. d H (x, x 0 ) = 0). It can easily be shown that the fraction of invariant inversions converges to 1/N (see S1 Text, section 1). A very important consequence of inversions is that mutated sequences can differ with the wild type by more than one nucleotide, i.e. d H (x, x 0 ) > 1 (edges colours in Fig 1 denote the values of the Hamming distance). This result allows us to gain a first insight of how inversions can promote the escape from local fitness peaks: they can "connect", in a single mutational event, genotypes that are at two or more point-mutational steps away. It is pertinent to remark that the combinatorics of inversions for alphabets with size jA L j ¼ 2n; 8n � 2 would imply even more connections.
It should be noted that the inversion combinatorics would be slightly different for linear sequences. In that case, the number of possible inversions is N(N + 1)/2.
Up to this point, we have shown how inversion mutations can actually broaden the horizon of evolutionary exploration in the genotype space.
Inversions rewire the adjacency of the genotype space. Now, for each directed graph of mutations m(x), we can associate a graph M(x) on the same set of vertices V(m). Corresponding to each directed edge of m, there is an edge of M with the same ends (loops being excluded). In this sense, the graph M(x) is the underlying (simple) graph of the directed graph m(x). That is, the graphs without self-and directed-edges so that when several edges (mutations) connect the genotype x with the same accessible-mutant x 0 , only one (undirected) edge is kept. Thus, every directed graph m(x) defines a unique, up to isomorphism, reduced graph M(x) (see the mathematical definition in [50], p.3). Now it is natural to do the union of each graph M(x), to describe how genotypes can be reached from somewhere in the genotype space in one mutation operation. We call this object the mutational network. It is defined as:

MðxÞ:
For point mutations the mutational network is the Hamming graph HðN; 2Þ [51, p. 230] (as we can see in Fig 2A, 2B and 2C for Hð4; 2Þ, Hð7; 2Þ and Hð10; 2Þ respectively), which is isomorphic to the canonical genotype space HðN; 2Þ ¼ Q N . The notion of isomorphism means that the mutational network for point mutations preserves the adjacency of the edge structure of the genotype space. Historically, the canonical graph of the genotype space overshadowed the richness of the (full) mutation graph, since theoretically only point mutations are usually considered as generators of mutational networks. For inversions, the mutational network does not necessarily inherit the (local) topology of the genotype space. For example, Fig 2D, 2E and 2F outline the structure of the mutational networks for inversion mutations for N = 4, 7 and 10. From the point of view of graph theory, inversion mutations "rewire" the adjacency of the genotype space, i.e. they link genomes such that d H (x, x 0 ) � 1. Also, in graph terms, the total number of accessible mutants per genotype corresponds to the node's degree κ x (defined as the number of edges in the graph incident on x [50, p. 3]). Therefore, κ x = D ν (x). On the other hand, the average node degree quantifies accessible-mutants (nodes) interconnections: It can be verified that for point mutations hκi = N, while for inversions mutations hκi > N (for N � 2) and therefore the genotypes are "more connected" to each other. Paraphrasing in terms of evolutionary biology, they are "more mutable". In this sense, hκi defines a mean mutability, which quantifies the ability to reach a different genome when the sequence undergoes a mutation. This property also holds for linear chromosomes, although as mentioned above, the average of node degrees is smaller since the number of possible inversions is lower than for circular chromosomes.

Inversions can reveal new evolutionary paths
Even though the nature of the genotype-to-fitness function is still largely unknown, an easy way to introduce it into computational models is by assuming that for genotypes x 2 X there exists a map from the set X to the real numbers f : X ! R. In the graph-based representation, each node (genotype) then possesses a fitness value f(x). This fitness landscape graph F is isomorphic to the hypercube graph Q N (i.e. the genotype space) and therefore can also be represented as Hamming graphs, providing a fitness value per node. So, the fitness landscape graph is univocally defined as: Likewise, as the mutational network we can also define the fitness network F , but in this case the edges are directed from genotypes with lower fitness to genotypes with higher fitness: which also depends on the neighbouring N n , with ν denoting the type of mutation (P and I for point and inversion mutations respectively). Therefore, the fitness network is the anisotropic version of the mutational network, the direction of the evolutionary paths being fitness-dependent. Precisely what mutational and fitness networks reveal to us is the ensemble of possible evolutionary paths. But in this case, fitness networks are diagrams showing the paths upward and their "altitudes" (fitness values).
To illustrate the definition of fitness network used in this work, we need to build fitness landscapes instances. For that, we use the well-known NK-model [45,46,48], recalling that for a genome of size N, the parameter K 2 {0, . . ., N − 1} corresponds to the epistatic coupling between loci and thus tunes the ruggedness of the landscape (for a brief introduction see Methods). Here, we use two neighbourhood coupling models between loci: i) the adjacent model in which the K loci are those closest to a focal locus x i ; ii) the random neighbourhood model, where the K loci are chosen randomly among the N − 1 loci other than x i (an illustration of epistatic interactions is sketched in NK model in Methods). In Fig 3 we show representative examples of fitness networks engineered for N = 4 with K epistatic random neighbours (see S1 Fig for the fitness networks with K epistatic adjacent neighbours). The landscapes range from single peaked K = 0 (no epistatic interaction) to full rugged landscapes K = 3 (highly connected epistatic interactions). Global fitness maxima and minima are highlighted by encircled nodes. In Fig 3A, 3B and 3C we show an instance of fitness networks for genotypes connected through pathways with point mutation steps. They have the same topology as the genotype space Q N¼4 and, therefore, are isomorphic to the graph representation of the fitness landscape F ffi F ¼ ðHð4; 2Þ; f Þ. When K = 0 all the trajectories arrive to the single global maximum of fitness. When we construct the fitness network with inversion mutations, we can verify in Fig  3D, 3E and 3F that there are more paths between all the genotypes, and so it is easier for an evolutionary process to explore more domains of the fitness landscape compared to point mutations. Many of these paths connect genotypes such that d H (x, x 0 ) > 1, and therefore, are like jumps between distant domains of the landscape. We can also verify that, for a given fitness function f, a node that is a local optimum on the fitness graph F ¼ ðHðN; 2Þ; f Þ, is not The layers are constructed such that each node is assigned to the first possible layer, with the constraint that all its predecessors must be in earlier layers. The colors of the nodes correspond to the values of the out-degrees, i.e. the number of edges going out of a node (note that color scales differ in range between panels). Therefore, nodes with node out-degrees equal to zero correspond to local fitness maxima (sink nodes). The landscapes' ruggedness are: single peaks K = 0, intermediate ruggedness K = 1 and full rugged case K = 3. Node sizes are scaled with fitness values (the best fitness, the largest, and vice versa). Global maximum of fitness are encircled in red. While the global minimum in blue. The total number of fitness maxima and minima are also reported. See S1 Fig  necessarily a local peak for inversion mutations in the fitness network. In most of the cases, the fitness landscape is "smoothed out" by inversion mutations, since the notion of local peak fades in the fitness network. However, the local peaks are not always smoothed out, as we can see in Fig 3E for K = 1, where genotype 0101 remains as a local peak. This is because 0101 cannot be mutated to genotype 1001 by any inversion mutation (see also the combinatorics of accessible mutants for 0101 in Fig 1). Note that for inversion mutations, it is verified that in some cases the global maximum can be reached from the global minimum in a single evolu- Finally, contrary to point mutations, inversions are not commutative: in many cases, two overlapping inversions applied to a same initial sequence in direct or reversed order lead to different final sequences. This can easily be shown on an example: where, starting from the same sequence, the two inversions (inv(2, 3) and inv(2, 4)) give different outcomes depending on their order. Note that, given this property, the classical definition of mutational epistasis does not hold for inversion mutations.

Getting higher on rugged landscapes
Up to now, we have shown results on the combinatorial (topological) differences between point and inversion mutations. Inversions cannot be mapped to the classical "fitness landscape" metaphor-being better represented through mutational networks and their juxtaposition with fitness landscapes through fitness networks. This is because, for inversion mutations, there are shortcut routes connecting distant sequences (differing by more than one base) in the genotype space and consequently in the fitness landscape. Therefore, this can be interpreted as "escape routes" from local peaks. We want to verify if as a consequence of these escape routes, an evolutionary process will be able to reach higher peaks of fitness. For that, we performed computer simulations in the SSWM setting, where adaptation occurs by sequential fixing of novel beneficial mutations (see Adaptive walks in Methods).
We focus our study on a series of n-repetitions of adaptive walks, where the evolutionary process is driven by (random) mutational steps. For a given set of independent initial random genomes with size N = 100, {x 0 } 2 {0, 1 } 100 , we create two pools of n = 100 simulations for point mutations and inversions respectively. As before, we use the NK model to engineer rugged fitness landscapes. In each round, the landscape is the same for simulations with point mutations and inversions, respectively. For independent explorations over (sub)domains of the landscape, we monitor the time-evolution of the fitness values until a fitness optimum is reached. This is when it is verified, in the simulation, that a genotype x loc n 2 X satisfies f ðx 0 Þ < f ðx loc n Þ; 8x 0 2 N n ðx loc n Þ: Subindex ν denotes the type of mutation (P for point mutations and I for inversion mutations). Then, we calculate the mean fitness value per K as: where the notation | K means that the average is calculated for a fixed value of K 2 {0, . . ., N − 1}.
In Fig 4 we show the behaviour of the average fitness hf ν i K , calculated from n = 100 instances of adaptive walks simulations in NK landscapes, with N = 100 and values of K ranging from 0 to 99. The simulations correspond to the case of epistatic interactions with K closest adjacent loci ('+' marker), and with K randomly chosen loci ('o' marker). The markers of the simulations with point mutations and inversions are coloured with blue and red respectively.
For the simplest case K = 0, with no epistatic interaction between neighbouring loci, we can verify in Fig 4, that the average fitness for point mutations and inversions are equal hf P i K=0 = hf I i K=0 ' 0.667. In this case, the landscape is smooth with only a single peak. Hence, mutations that increase fitness are not hard to find and hf ν i K=0 is independent of mutation types. This result also agrees with the Kauffman's (analytical) result hf i K¼0 ¼ 2 3 , which was calculated using order statistics arguments [48, p. 55]. Then, for K > 0, the average fitness hf ν i K increases with K until reaching a maximum value of fitness. In relation to this maximum, in Fig 4 we can identify the following four cases: i) point mutations with adjacent epistatic interactions, hf P i K = 0.707 for K = 2; ii) point mutations with random epistatic interactions, hf P i K = 0.722 for K = 5; iii) inversions with adjacent epistatic interactions, hf I i K = 0.747 for K = 4; and iv) inversions with random epistatic interactions hf I i K = 0.732 for K = 4. After these maximum, the fitness values decrease as K increases. This trend is also consistent with the seminal simulations carried out by Kauffman and Weinberger [46,48]. For example, when K ! N − 1, the mean fitness converge to the same value, regardless of the type of epistatic interaction neighbourhood. For point mutations with random and adjacent epistatic interactions, we obtained hf P i K=96 ' 0.580 and this agrees very well with the Kauffman's numerical outcomes for K = 96, c.f. [48, Tables 2.1 and 2.2]. It is worth mentioning that these trends-lower fitness being associated with increasing epistatic interactions-correspond to the well-known "complexity catastrophe" described by Kauffman [48, p. 52] (see also Refs. [52,53]). These numerical outcomes confirm that our numerical set-up reproduces, with point mutations, what is known about hf P i K vs K in the NK model [46,48]. Now, what's new is that for inversions, the average fitness values are higher than those for point mutations. Indeed, for K ! N − 1, the average fitness trend is very different from that of point mutations. For example, the complexity catastrophe estimates that as K increases, the expected fitness of the local maximum (for point mutations) decreases toward 1/2 [48], which is indeed verified here. But for inversions, the evolutionary process reaches higher expected fitness values hf I i K=99 = 0.610 > hf P i K=99 = 0.579, for both random and adjacent epistatic interactions. To generalize these results, we reproduced this experiment with other, more restrictive, definition of inversion mutations. More specifically, we tested inversions on linear chromosomes (with boundary conditions) and circular inversions with upper size limit s � N, ranging from 1%-a single locus (i.e. inversions are like point mutations)-up to 100% of chromosome size. Simulations with linear chromosomes show no significant difference from our reference circular model (see supplementary S1 Text, section 2 for detailed results). Simulation with an upper size limit show that the final fitness values increases as the size limit increases. However, the gain is maximal for small s values (typically up to s = 16) showing that small and mid-sized inversions are sufficient to reach high fitness peaks. Fig 4 also show that, in average and for almost all K, the adaptive walks reach higher fitness peaks through inversions than through point mutations. To better visualise this statement, in the inset of Fig 4 we plot the following difference: for the two neighbourhoods. To specify which type of epistatic interaction we are referring to, in what follows, we will use the notation (Δf K ) rnd for the case in which the fitness differences correspond to simulations with random neighbourhood, and (Δf K ) adj for the adjacent ones (respectively the markers 'o' and '+' in Fig 4). In the absence of epistatic interactions, i.e. K = 0, we can note that (Δf K ) rnd = (Δf K ) adj = 0. Then, in the presence of epistatic interactions, (Δf K ) rnd is monotonically increasing between 0 < K � 2. Then for 2 < K � 31, (Δf K ) rnd is monotonically decreasing, and for K > 31 it is again monotonically increasing. For random epistatic interactions between 2 < K � 50, the fitness values for the case with inversion are not very different from those of point mutations (note in Fig 4 that, in this interval, the red and blue curves with marker 'o' are very close to each other).
On the other hand, we can observe that for 5 < K � 65, (Δf K ) adj is monotonically decreasing, and for K > 65 it is again monotonically increasing. We can also observe that, between K > 0 and K ¼ :  Fig 4, the gap between the tails of the red and blue curves). Therefore, our results show that in the presence of inversion it is possible to reach higher fitness when compared to adaptive walks with only point mutations.
A direct interpretation of this result is given by the properties of the inversion's mutational network as it has been described above (see e.g. Fig 2). Indeed, as it is more densely connected than the point-mutation mutational network, it is likely to allow a larger exploration of the fitness landscape and thus reach higher peaks, as observed here. However, given that the ruggedness of a fitness landscape depends on the mutational operator at work, an alternative explanation is that inversion mutations result in a smoother fitness landscape than that of point mutations, hence facilitating the finding of trajectories leading to higher peaks.
To test this assumption, we generalized the roughness measure introduced by Aita, Ikamura and Husimi in [54]. More precisely, we measured deviations from fitness additivity (in the language of the NK model, we say that a landscape is additive when it is non-epistatic, that is, K = 0). We here use the term roughness from [54] to distinguish this measure, which is a local one, from the classical ruggedness of the NK-fitness landscape which is a global property of the landscape. See, for example, Refs. [55] and [56], for other definitions of roughness and how they are calculated. Following the approach introduced in [54], we computed the roughness of the fitness landscape as the root mean square fitness variation due to each possible mutation, for both point mutations and inversions (see Methods for a formal mathematical definition of this measure). The results are shown in Fig 5. As expected, for point mutations the roughness of the fitness landscape is (almost) linearly proportional to the epistatic interaction parameter K, for both types of epistatic interaction neighborhoods (adjacent and random). In contrast, in the case of inversion mutations, the roughness is always greater than that of point mutations, this trend being particularly visible for random epistatic neighborhood when compared to adjacent neighborhood. Interestingly, even for K = 0, the roughness is already higher than for point mutations (both epistatic neighborhood being equivalent in that case) while for K = N − 1 the roughness converges approximately towards similar values both for inversions and point mutations whatever the epistatic neighborhood.
This result shows that inversion mutations actually don't smooth the fitness landscape. On the opposite, the average roughness increases much faster with K in the case of inversions than in the case of point mutations (the roughness of the inversion-based fitness landscape with K = 1 being similar to the one for point mutations with K = 50, Fig 5). This result also suggests a new explanation for the advantage of inversion mutations over point mutations. While the high connectivity of the inversions mutational network enables a better exploration of the fitness landscape, this effect is hampered (and not facilitated) by the effect of inversions on the roughness and while the combination of both positive (connectivity) and negative (roughness) is favorable for all values of K in the case of adjacent neighborhood, it is only favorable for high values of K in the random epistatic neighborhood. Indeed, for epistatic interactions with a random neighborhood, there is no noticeable difference between the average fitness values up to K ' 40 (red and blue curves with markers 'o' in Fig 4). This is likely to be due to the fact that inversions are segmental operators. When epistatic interactions are confined to a segment close to the focal nucleotide (which is the case for the adjacent neighborhood but not the random one), both segments can largely overlap, hence limiting the effect of the inversion to a set of epistatically interacting genes. This reduces the average roughness (compared to random epistatic neighborhood), leading to a more efficient exploration of the fitness landscape. Although a full mathematical proof is out of the scope of this paper, we develop a representative mathematical analysis that illustrates the origin of this pattern in S1 Text (section 3).

Discussion
The results presented in this paper show that intragenic inversion mutations lead adaptive walks to reach higher fitness peaks on rugged landscapes. We have performed simulations in NK landscapes and have shown that the expected fitness values are higher for inversions than for point mutations. This holds for all degrees of ruggedness (epistasis), ranging from singlepeak (K = 0), moderately rugged (1 � K < N − 1), to fully rugged landscapes (K = N − 1). Simulations with point mutations agreed well with the already known characteristics of the NK model, and made it possible to establish a "control group" to ensure the reliability of the differences with inversion mutations. We also observed that for adjacent epistatic interactions, the differences of expected fitness values between inversions and point mutations are greater than in the case of random epistatic interactions. In this sense, we conjecture that this should be the consequence of a synergistic effect between inversions and adjacent epistatic interactions, epistatic adjacency enabling a set of interacting loci to be inverted at once without affecting other, non-interacting, loci (see S1 Text for a detailed discussion). We believe that the relationship between epistasis and structural inversions is an area that has not yet been deeply explored.
Our analysis consisted of adaptive processes driven by mutation-specific evolutionary steps, i.e. in addition to the widely used point mutations, we introduced a minimal model of inversion mutations in double-stranded (digital) genomes. This model has also revealed some consequences of the limits of simulations with point mutations. We showed that, in addition to ruggedness due to epistatic interactions, the escape process can also depend on the interrelationships between the genotype space and the fitness landscape that are mediated by the type of mutation. In particular, we showed that for inversion mutations, the graph-theoretical representation of accessible mutants displayed a complex topology, in comparison with the canonical genotype space constructed with point mutations. By definition, the node degree of the graph of mutations is the number of accessible mutants. In the case of inversions this number is no longer constant over the node set-as in the case of point mutations-but varies depending on the specific sequence composition. Therefore, although it is correct that we can generate genotypic space through point mutations, that does not (strictly) imply that evolutionary paths in the fitness landscape have to be solely through point mutations. In this sense, the inversion mutations allowed us to reveal new topological properties of the interconnection between genotypes through what we have defined as mutational networks. Indeed, it is this mutational network that mediates the interconnection between genotype space and the adaptive dynamics in the fitness landscape. The mutational network can be translated as a fitness network when the fitness values of each mutant genotype are included. Thus, revealing the directions of possible evolutionary paths. Let us remark that graph theory is a framework used in various models of evolution (see for example Refs. [57][58][59][60][61][62][63][64][65][66][67]) and has been advantageous in analysis that use the Kauffman model, such as [51,[68][69][70][71][72]. However, most of these graph representations for mutations and fitness landscapes are isomorphic to the (hypercubic) genotype space, whereas our definition of mutational and fitness networks are not necessarily isomorphic to the genotype space. Moreover, we showed that for fitness networks generated by inversions, there are more mutational pathways between genotypes compared to point mutations. Therefore, for inversion mutations, at each step an evolutionary process can potentially explore more accessible mutants, than the "classical" estimation ðjA L j À 1ÞN, for any alphabet A L ; 8L 2 f2n : n 2 Ng, with size jA L j ¼ 2n. In this sense, our work can straightforwardly complement the results reported in [73], for jA L j > 2 with point mutations. The main message here is that in addition to the well-known utility of the fitness landscape metaphor, its topographic properties (due to epistasis) are not sufficient for modelling the escape from local peaks, and additional information should be included via the topology of mutational networks and fitness networks. This information can be useful to predict evolutionary trajectories in fitness landscapes [44,55,56,[74][75][76].
An important takeaway from this work is that an effort must be made to incorporate features of the structural variations of genomes, such as submicroscopic intragenic rearrangements. In this paper, we have taken a first step in this direction by modelling inversion mutations. A by-product of the construction of our model of inversion mutations revealed topological properties associated with the genotypic space and the accessible mutants for potential evolutionary paths. This is a consequence of the combinatorics of accessible mutants, which depends on structural aspects of this type of chromosomal rearrangement. On the other hand, our graph theoretical representations are consistent with the idea of adaptive walks in complex networks [16,17,77]. The key difference is that in our model we do not have to postulate a priori a network that satisfies a certain topology (e.g. scale-free or random) as in [16,17,77]. In our case, the topology arises as a consequence of the type of mutations. Following the interpretation provided in [17] about multiple mutations in a single evolutionary step, we suggest that another alternative to justify (or interpret) their topological inspired walks is via generic structural variations in concomitance with our model. In the case of the simulations of adaptive walks on complex networks reported in [77], the authors state that "it seems more realistic to ponder sequence spaces where the node's connectivity is not the same for every node, as it is in hypercubes". We agree with the authors of [77] and [17] that the degree of a genotype (in the mutational network for us) measures the availability of accessible mutants. This is what we have proposed to define as mutability, that is, the ability to change from one genotype to another under a mutation. We believe that it could be interesting to explore this notion of mutability and its relationship with the genetic potential of mutations that give rise to novel (beneficial) phenotypes [78].
In this work, we studied the effect of inversion mutations on the maximum fitness reached on rugged landscapes and compare it to the maximum fitness reached by point mutations. Importantly, both kind of mutations have been tested in independent simulations under the Strong Selection Weak Mutation regime. Hence, although we have shown that inversions reach higher fitness values in these conditions, we cannot compare the evolutionary dynamics between these two mutational setups. However, it is important to stress that, in a real population, both kind of mutations would not occur in isolation. Any evolving population undergoes all mutation types, including both point mutations and inversions. Hence, inversion mutations should not be considered in competition with point mutations, but rather as a synergistic interaction. Studying the consequences of this interaction on the evolutionary dynamics constitutes one of the most exciting perspective of this work.
In our model we did not include recombination [79] and other chromosomal rearrangements, such as duplications, deletions, and translocations. However, our purpose with inversion mutations has been to exemplify a simple mechanistic model of structural variation. Of course, our sequence model includes many simplifications. In particular, we use binary sequences and simulated a fully coding compact genome with a circular double-stranded DNA. Although most of our theoretical results hold in a more general case, the effect of inversions on more realistic coding sequences (with e.g. a 4 bases alphabet, multiple reading frames and ORF identified by start and stop codons and separated by non-coding sequences) could reveal other properties of interest. For instance, micro-inversions effect on reading frames is likely to be specific and very different from the effect of point mutations (that don't shift the reading frames) or InDels (that are likely to shift it). Indeed, inversions can alter a subsequence of an ORF without changing the reading frame of the start/stop codons. On the opposite, an inversion can easily remove (or create) a stop codon.
All throughout this study, we focused on binary sequences, simplifying the 4-bases nature of real genomes. Although the binary description is very common in theoretical and computational models, it is important to mention that the properties of the different mutational operators, hence of the generated mutational networks, may differ depending on the size of the alphabet [73]. In the case of inversions, although our main conclusions about the complex structure of the mutational network still hold, a 4-bases alphabet with two pairs of complementary bases (A-T and C-G) would introduce important properties compared to the binary case. Indeed, given the mathematical definition of inversions (Eq 5), it is straightforward that the composition of the inverted segment will conserve the relative fraction of AT and CG pairs relatively to the original one (a specific situation being the inversion of a segment of size one that can only switch A and T or C and G). It immediately follows that inversions cannot change the AT/CG ratio of the sequence and that, for a sequence of length N, the mutational network generated by inversions contains at least N + 1 disconnected sub-networks (which sizes will depend on the AT/GC ratio of the sequence, strongly biased sequences leading to smaller subnetworks). Hence, compared to the binary model, the 4-bases model increases the size and connectivity of the mutational network generated by point mutations [73]. On the opposite, in the inversion-generated mutational network, it isolates several sub-networks from each others. The effect on evolutionary dynamics, as we studied it here using the NK landscape, is still to be explored. Indeed, depending on the composition of the initial sequence, with a 4-bases alphabet some local/global optima may not be accessible. Consequently, the advantage of inversion mutations may be reduced, and even be cancelled if the number of local optimum is very low (i.e. for K � N). However, it is worth mentioning that real genomes undergo both kinds of mutational events and that point mutations connect the inversion-isolated sub-networks by changing the AT/GC ratio of the sequence. Exploring how both kinds of mutations (and others) interact is clearly beyond the scope of this manuscript, but studying the synergistic effect of inversions and point mutations in a more sequence-realistic model like the Aevol model [80,81] is clearly an appealing perspective.
Despite the simplifications used in this study, our results show that structural inversions could be considered not only as changes in the orientation of sequences that don't alter the genetic content, as classically supposed in the literature [39], but also as a source of intragenic variations. In this sense, our phenomenological model is supported by the empirical evidence of an intragenic inversion associated with the creation of new regulatory elements-required e.g. for the termination-activation of transcription in the nitric oxide synthase gene in Lymnaea stagnalis [30]. As well, the pathogenic mutation due to an intragenic inversion of seven nucleotides in human mitochondrial DNA [29]. Furthermore, although first generation sequencing technologies were unable to identify submicroscopic rearrangements [82], the development and availability of novel sequencing technologies [28,82], opens the possibility of characterising intragenic structural variations and may be of particular importance to unravel new aspects of mutations in molecular evolution. Therefore, we may soon require new theoretical and computational models to simulate the fullness of chromosomal rearrangements in evolutionary biology. We hope this work makes a first step in this direction.

Conclusion
The statements presented in this paper provided computational evidence that, for a very simple model of evolution in the strong selection weak mutation limit, an adaptive process in rugged landscapes driven by intragenic inversion mutations can reach higher fitness values (compared to a same process driven by point mutations). Therefore, this implies that intragenic inversion mutations can lead evolution to escape local fitness peaks in rugged landscapes. The way our model was conceived also proves that escape from a local peak of fitness can occur in constant environments without contingencies. Our model for inversion mutations not only elucidated an escape mechanism, but have also made it possible to uncover interesting aspects about the combinatorics of inversions and their relationship with mutated genotypes, genotype spaces and fitness landscapes in terms of graphs representations.

The model
Preamble: Limits in the single-nucleotide mutation scenario. It is worthwhile to state the main issue when point mutations are the only source of genetic variations in evolutionary models. At the molecular level and besides the fitness values, the structure of DNA mutations constrains the way evolution can move through the genotype space. For example, a common reasoning in molecular evolution theory is: for any alphabet A L ; 8L 2 f2n : n 2 Ng, with size jA L j ¼ 2n (this total number of letters with even parity being due to the double-stranded structure), a sequence x of N nucleotides, would have mutant neighbours differentiated by a single point mutation [45,49,83,84]. These D neighbouring genotypes are available for natural selection. Then, at the SSWM limit, only one of these neighbouring sequences can be fixed, chosen among those with fitness values higher than the wild-type fitness. Once a new mutant is fixed, a new set of D mutant neighbours is available for selection. Repeating this process unfolds an evolutionary path until it reaches a local (or global) fitness peak in the rugged landscape. However, if a local peak is reached (i.e. the state when all D accessible genotypes have strictly lower fitness values), then the evolutionary process is "trapped" because any other fitter genotype is at two or more point-mutational steps away (i.e. only attainable by "descending" through a valley in the fitness landscape). Digital sequence scheme. First, let us recall the very basic and well-known notion that DNA is a double strand molecule with two nucleotides chains, held together by complementary pairing of adenine (A) with thymine (T) and guanine (G) with cytosine (C). Given a DNA strand, as for example ATCGATTGAGCTCTAGCG, its complementary strand is TAGCTAACTC GAGATCGC, which in the IUPAC's notation is where the leading strand is on top and the DNA strand orientation is by convention 5 0 ! 3 0 . Throughout the presentation of our model, we adopt the alphabet A 2 ¼ f0; 1g, so the genotypes are binary sequences of (constant) length N 2 N �2 . As a low-level structural representation consistent with the DNA molecular biology, these genotypes are double-stranded sequences All the sequences x of size N define the set X 2 f0; 1g N of possible genotypes.
In analogy with the example above, the representation of this double-stranded digital sequence is: It is very important to clarify that our schematic representation should not be confused with the usual encoding A DNA ! A 2 , with the convention purines {A, G} ! 0 and pyrimidines {T, C} ! 1. The artificial genetics in our model only have two nucleotides, and they complement each other. We also adopt the following approximations: • We assume that the sequences are circular, i.e. periodic boundaries: • We neglect the existence of coding sequences separated by non coding regions. Biologically, this corresponds to compact genomes with almost no non-coding sequences. In this sense, our model mimics the molecular evolution of some viruses [40] or (animal) mitochondrial DNA [43]. Consequently, we are modelling intragenic mutations [44,82].
• We neglect geometrical aspects such as physical configurations like folding, twists, coiled structures, hairpin loops, etc.
• We do not consider transcription-translation processes.
• We do not include recombination, so we are modelling asexual replication.

Structural inversions model.
To establish the main idea of DNA inversion mutations well, now let us illustrate this mechanism with the following representation: where in the middle is depicted the segment where the mutation occurs and their corresponding inversion (boxes and colours highlight the segment where the inversion occurs). A glance over schema (2) shows why a double-stranded-like model is unavoidable to model intragenic inversions. For computational purposes, an inversion mutation can be split in two operations: • The conjugation operationĈ: for i, j 2 {1, . . ., N}.
Note that, as the genome is circular, there is no relation of order between i and j (i.e.Ĉ and P are well-defined even when i > j). Other sites k = 2 {i, . . ., j}, remain unchanged. Then, we have the (two-step) inversion operation:Î

�Ĉ∘P: ð5Þ
For example: Trivially, it can also be verified thatĈ andP commutes: Besides, Eq 5 can also define a single-locus mutation: when i = j, thenÎ is a single bit-flip i.e. a point mutations.
Computationally, these operations can be easily implemented through Algorithm 1: Mutate. With this simple algorithm, it is possible to calculate the combinatorics of the inversion mutations. For example, the enumeration of accessible mutants for each genotype with N = 4 shown in

NK model
A well known model of genetic epistatic interactions is the NK family of rugged multipeaked fitness landscapes [45][46][47][48]. In this model, besides the genome length N 2 N, the integer K 2 Z ð0;NÀ 1Þ , describes the epistatic interactions between loci in the genome and the contribution of each component to the total fitness, which depends on its own value as well as the values of K other loci. The fitness per locus is formally defined as: Here f i ðx i ; x i 1 ; . . . ; x i K Þ depends on the state of locus x i 2 {0, 1} and K other loci x i K 2 f0; 1g. The f i 's are given by N � 2 K+1 independent and identically distributed random variables sampled from a given uniform probability distribution. See the example shown in [47, Table I], for a very illustrative description for the computing of the epistatic contribution per locus. The pattern into which the scheme of interaction between loci is connected is known as the epistatic neighbourhood [46,47]. In our simulations we use two popular neighbourhood models: • The adjacent neighbourhood model, where i and the K other sites are successively ordered, i.e. i, i + 1, . . ., i + K (each variable modulo N when using periodic boundary conditions).
• The random neighbourhood model, where i and the K other loci are chosen at random according to a uniform distribution from {1, 2, . . ., N}.
The most important feature of the NK model is that the parameter K tunes the landscape ruggedness, that is the distribution of fitness local maximums, ranging from non-epistatic interactions when K = 0 (a Mount-Fuji-like landscape with a single peak), to the full rugged (or random) landscape when K = N − 1.

Adaptive walks
The zero-order approximation of our model to population genetics theory is on the limit of strong selection weak mutation (SSWM) [49,Ch. 5]. In this limit, the adaptive walk model describes very well the molecular evolution of isogenic (monomorphic) populations, as the sequential fixing of novel beneficial mutations. Therefore, the simulation of evolutionary processes in our digital model can easily be translated within this framework. That is, instead of describing a population of organisms with a pool of genotypes, it is sufficient to simulate the evolutionary trajectory over the fitness landscape of a single initial genotype and its successive mutations.
The procedure goes as follows: a (randomly chosen) starting genotype x 2 X varies through successive mutations (calculated with Algorithm 1: Mutate) resulting in a mutated sequence y 2 N n ðxÞ, where N n ðxÞ is the set of all accessible mutants of genotype x. Then, the fitness f(x) and f ðyÞ are calculated according to the NK model with Eq 8. If f(y) > f(x), the mutated genotype is selected, otherwise other mutations on x are tested until the fitness increases. In Algorithm 1: Mutate, the loci [i, j]2{1, . . ., N} are drawn from a pseudo random number generator function. With this recipe, the evolutionary dynamics is simulated up to a local fitness maximum is reached, i.e. x 2 X satisfies: f ðyÞ < f ðxÞ; 8y 2 N n ðxÞ. In other words, we verify that all mutants for a given genotype do not have higher fitness values, if not, the simulation continues.
The main routine to simulate adaptive walks on the Kauffman's NK-fitness landscape model, with point mutations (as usual) and inversions (as new) is available at https://gitlab. inria.fr/letrujil/getting-higher. Our code is based on the one developed by Wim Hordijk (in its version of August 23, 2010 and which is available at http://www.cs.unibo.it/*fioretti/CODE/ NK/), which uses some code from Terry Jones (https://github.com/terrycojones/nklandscapes).

Roughness measure
The roughness to slope ratio proposed by Aita, Iwakura, and Husimi in [54], can be re-interpreted in terms of the local measure of roughness of the surface of a solid material or an irregular interface, i.e. as the root mean square surface width in function of the height at a given place on the surface (see for example [85, p.22]). In our case if we assume that the "height" is equivalent to the value of the fitness f(x) of genotype x, and "the place on the surface" corresponds to a domain (on the surface) of the fitness landscape, then we can define the measure where the index ν denotes point mutations (P) or inversions (I), and E(m ν (x)) is the set of all possible mutations y of a given genotype x (i.e. the edges of the directed multigraph of mutations m ν (x)).
Supporting information S1 Fig. NK fitness networks for epistatic interactions with adjacent neighbouring. Representative instances of the NK model for N = 4 and their fitness networks in layered representation. The layers are constructed such that each node is assigned to the first possible layer, with the constraint that all its predecessors must be in earlier layers. The colors of the nodes correspond to the values of the out-degrees, i.e. the number of edges going out of a node (note that color scales differ in range between panels). Therefore, nodes with node out-degrees equal to zero correspond to local fitness maxima (sink nodes).