Landscape Encodings Enhance Optimization

Hard combinatorial optimization problems deal with the search for the minimum cost solutions (ground states) of discrete systems under strong constraints. A transformation of state variables may enhance computational tractability. It has been argued that these state encodings are to be chosen invertible to retain the original size of the state space. Here we show how redundant non-invertible encodings enhance optimization by enriching the density of low-energy states. In addition, smooth landscapes may be established on encoded state spaces to guide local search dynamics towards the ground state.


Introduction
Complex systems in our world are often computationally complex as well. In particular, the class of NP-complete problems [1], for which no fast solvers are known, encompasses not only a wide variety of well-known combinatorial optimization problems from the Travelling Salesman Problem to graph coloring, but also includes a rich diversity of applications in the natural sciences ranging from genetic networks [2] through protein folding [3] to spin glasses [4][5][6][7]. In such cases, heuristic optimization -where the goal is to find the best solution that is reachable within an allocated time -is widely accepted as being a more fruitful avenue of research than attempting to find an exact, globally optimal, solution. This view is motivated at least in part by the realization that in physical and biological systems, there are severe constraints on the type of algorithms that can be naturally implemented as dynamical processes. Typically, thus, we have to deal with local search algorithms. Simulated annealing [8], genetic and evolutionary algorithms [9], as well as genetic programming [10] are the most prominent representatives of this type. Their common principle is the generation of variation by thermal or mutational noise, and the subsequent selection of variants that are advantageous in terms of energy or fitness [11].
The performance of such local search heuristics naturally depends on the structure of the search space, which, in turn, depends on two ingredients: (1) the encoding of the configurations and (2) a move set. Many combinatorial optimization problems as well as their counterparts in statistical physics, such as spin glass models, admit a natural encoding that is (essentially) free of redundancy. In the evolutionary computation literature this ''direct encoding'' is often referred to as the ''phenotype space'', X . The complexity of optimizing a cost function f over X is determined already at this level. For simplicity, we call f energy and refer to its global minima as ground states. In evolutionary computation, one often uses an additional encoding Y , called the ''genotype space'' on which search operators, such as mutation and cross-over, are defined more conveniently [12,13]. The genotype-phenotype relation is determined by a map a : Y ?X |f1g, where 1 represents phenotypic configurations that do not occur in the original problem, i.e. non-feasible solutions. For example, the tours of a Traveling Salesman Problem (TSP) [14] are directly encoded as permutations describing the order of the cities along the tour. A frequently used encoding as binary strings represents every connection between cities as a bit that can be present or absent in a tour; of course, most binary strings do not refer to valid tours in this picture.
The move set (or more generally the search operators [15]) define a notion of locality on X . Here we are interested only in mutation-based search, where for each x [ X there is a set of neighbors N(x) that is reachable in a single step. Such neighboring configurations are said to be neutral if they have the same fitness. Detailed investigations of fitness landscapes arising from molecular biology have led to the conclusion that high degrees of neutrality can facilitate optimization [11,16]. More precisely, when populations are trapped in a metastable phenotypic state, they are most likely to escape by crossing an entropy barrier, along long neutral paths that traverse large portions of genotype space [17].
In contrast, some authors advocate to use ''synonymous encodings'' for the design of evolutionary algorithms, where genotypes mapping to the same phenotype x [ X are very similar, i.e., a {1 (x) forms a local ''cluster'' in Y , see e.g. [13,18,19]. This picture is incompatible with the advantages of extensive neutral paths observed in biologically inspired landscape models [16,20] and in genetic programming [21,22]. An empirical study [23], furthermore, shows that the introduction of arbitrary redundancy (by means of random Boolean network mapping) does not increase the performance of mutation-based search. This observation can be understood in terms of a random graph model of neutral networks, in which only very high levels of randomized redundancy result in the emergence of neutral paths [24].
An important feature that appears to have been overlooked in most recent literature is that the redundancy of Y with respect to X need not be homogeneous [12]. Inhomogeneous redundancy implies that the size of the preimage Da {1 (x)D may depend on x [ X . If Da {1 (x)D is anti-correlated with the energy f (x), then the encoding Y enables the preferential sampling of low-energy states in X . Thus even a random selection of a state yields lower energy when performed in Y than in X . Here we demonstrate this enrichment of low energy states for three established combinatorial optimization problems and suitably chosen encodings. The necessary formal aspects of energy landscapes and their encodings are outlined in the Methods section. We formalize and measure enrichment in terms of densities of states on X and Y , see Methods for a formal treatment. We illustrate the effects of encoding by comparing performance of optimization heuristics on the direct and encoded landscapes.

Number Partitioning
The first optimization problem we consider is the number partitioning problem (NPP) [1]: this asks if one can divide n positive numbers a 1 ,a 2 , . . . ,a n into two subsets such that the sum of elements in the first subset is the same as the sum over elements in the other subset. The energy is defined as the deviation from equal sums in the two subsets, i.e., where the two choices x i [ f{1,z1g correspond to assignment to the first or to the second subset, respectively. The flipping of one of the spin variables x i is used as a move set, so that the NPP landscape is built on a hypercube. The NPP shows a phase transition between an easy and a hard phase. We consider here only instances that are hard in practice, i.e., where the coefficients a i have a sufficiently large number of digits [25]. The so-called prepartitioning encoding [26] of the NPP is based on the differencing heuristic by Karmakar and Karp [27]. Departing from an NPP instance (a 1 , . . . ,a n ), the heuristic removes the largest number, say a i , and the second largest a j and replaces them by their difference a i {a j . This reduces the problem size from n to n{1. After iterating this differencing step n{1 times, the single remaining number is an upper bound for -and in many cases a good approximation to -the global minimum energy. The minimizing configuration itself is obtained by keeping track of the items chosen for differencing. Replacing a i and a j by their difference amounts to putting a i and a j into different subsets, i.e.
The prepartitioning encoding is obtained by modifying the initial condition of the heuristic. Each number a i is assigned a class y i [ f1, . . . ,ng. A new NPP instance a 0 1 , . . . ,a 0 n is generated by adding up all numbers a i in the same class y i into a single number a 0 yi . After removing zeros from a 0 , the differencing heuristic is applied to a 0 . In short: y i~yj imposes the constraint x i~xj .
Running the heuristic under this constraint, the resulting configuration x~a(y) is unique up to flipping all spins in x.
The so defined mapping a : Y ?X is surjective because for each x [ X , a(y)~x for y i~1 if x i~1 and y i~2 otherwise. Two encodings y,z [ Y are neighbors if they differ at exactly one index i [ f1, . . . ,ng. This encoding is the one whose performance we will compare with the direct encoding later.

Traveling Salesman
Our next optimization problem, the Traveling Salesman Problem, (TSP) is another classical NP-hard optimization problem [1]. Given a set of n vertices (cities, locations) f1, . . . ,ng and a symmetric matrix of distances or travel costs d ij , the task is to find a permutation (tour) p that minimizes the total travel cost where indices are interpreted modulo n. Here, the states of the landscape are the permutations of f1, . . . ,ng, X~S n . Two permutations p and s are adjacent, fp,sg [ L, if they differ by one reversal. This means that there are indices i and j with ivj such that s k~pizj{k for iƒkƒj and s k~pk otherwise. Similar to the NPP case, an encoding configuration y [ Y :~f1,2, . . . ,ng n acts as a constraint. A tour p [ X fulfills y if for all cities i and j, y i ƒy j implies p {1 (i)ƒp {1 (j). Thus y i is the relative position of city i in the tour since it must come after all cities j with y j vy i . All cities with the same y-value appear in a single section along the tour. If there are no two cities with the same y-value then y itself is a permutation and there is a unique p [ X obeying y, namely p~y {1 .
Among the tours compatible with the constraint, a selection is made with the greedy algorithm. It constructs a tour by iteratively fixing adjacencies of cities. Starting from an empty set of adjacencies, we attempt to include an adjacency fi,jg at each step. If the resulting set of adjacencies is still a subset of a valid tour obeying the constraint, the addition is accepted, otherwise fi,jg is discarded. The step is iterated, proposing each fi,jg exactly once in the order of decreasing d i,j . This procedure establishes a mapping (encoding) a : Y ?X . Since each tour p can be reached by taking y~p {1 , a is complete. In the encoded landscape, two states y,z [ Y are adjacent if they differ at exactly one position (city) i.

Maximum Cut
The last example we consider is a Spin Glass problem. Consider the set of configurations X~f{1,z1g n with the energy function for a spin configuration x [ X . Proceeding differently from the usual Gaussian or +J spin glass models [28,29], we allow the coupling to be either antiferromagnetic or zero, J ij [ f{1,0g. This is sufficient to create frustration and obtain hard optimization problems. Taking the negative coupling matrix {J as the adjacency matrix of a graph G, the spin glass problem is equivalent to the max-cut problem on G, which asks to divide the node set of G into two subsets such that a maximum number of edges runs between the two subsets [1]. The idea for an encoding works on the level of the graph G, which we assume to be connected. The set Y of the encoding consists of all spanning trees of G. In the mapped configuration x~a(y), x i and x j have different spin values whenever ij is an edge of the spanning tree y. Since a spanning tree is a connected bipartite graph, this uniquely (up to z1={1 symmetry) defines the spin configuration x. The encoding a is not complete in general. Homogeneous spin configurations, for instance, are not generated by any spanning tree. Each ground state configuration x ground , however, is certain to be represented by a spanning tree due to the following argument. Suppose there is a minimum energy configuration x ground that is not generated by any spanning tree. Then the subgraph of G formed by all edges connecting unequal spins in x ground is disconnected. We choose one of the connected components, calling its node set C. By flipping all spins in C, we keep all edges present for x ground . Since G is connected, we obtain at least one additional edge from a node in C to a node outside C. Thus we have constructed a configuration with strictly lower energy than x ground , a contradiction. Two spanning trees y,z [ Y are adjacent, if z can be obtained from y by addition of an edge e and removal of a different edge f .

Enrichment
We now study enrichment as well as landscape structure on these three rather different problems. To this end we consider the cumulative density of states in the original landscape and Q f 0a defined analogously in the encoded landscape. In order to quantify the enrichment of good solutions, we compare the fraction h of all states with an energy not larger than a certain threshold g in the original landscape with the fraction r(h) using the same threshold in the encoding. The encoding thus enriches low energy states if r(h)&h for small h. Figure 1 shows that this is the case for the three landscapes and encodings considered here. We find in fact that the density of states r(h)=h is enriched by several orders of magnitude in the encoded landscape, for all the cases considered. Reassuringly, this trend of enrichment persists all the way to the ground state: that is, the encodings contain many more copies of the ground state than the original landscape. It appears in fact that the enrichment of ground states increases exponentially with system size. We can thus conclude that with the choice of an appropriately encoded landscape, it is easier both to find lower energy states from higher energy ones, and thus have more routes to travel to the ground state, as well as to reach the ground state itself from a low-energy neighbor, as a result of enrichment.

Neighborhoods and neutrality
We continue the analysis of the encodings with attention to geometry and distances. A neutral mutation is a small change in the genotype that leaves the phenotype unaltered. In the present setting, a neutral move in the encoding is an edge fy,zg [ M such that a(y)~a(z). In general, the set of neutral moves is a subclass of all moves leaving the energy unchanged. An edge fx,yg with f (a(x))~f (a(y)) but a(x)=a(y) is not a neutral move in the present context. In the following, we examine the fraction of neutral moves for the encoded landscapes mentioned above. Figure 2(a) shows that the fraction of neutral moves approaches a constant value when increasing the problem size of NPP and max-cut. The fraction of neutral moves in the traveling salesman problem, on the other hand, decreases as 1=n with problem size n. The average number of neighbors encoding the same solution grows linearly with n, since the total number of neighbors is n(n{1) for each y [ Y in the TSP encoding.
If a move in the encoding is non-neutral, how far does it take us on the original landscape? We define the step length of a move fy,zg [ Y as the distance between the images of y and z on the original landscape, s(fy,zg)~d X (a(y),a(z)) ð5Þ using the standard metric d X on the graph (X ,L). Obviously, fy,zg is neutral if and only if s(fy,zg)~0. Figure 2(b) compares the cumulative distributions of step length for number partitioning and max-cut. It is intractable to get the statistics of s for the TSP problem for larger problem sizes since sorting by reversals, i.e., measuring distances w.r.t. to the natural move set, is a known NPhard problem [30]. For the encoding of number partitioning, step lengths are concentrated around n=2. Making a non-neutral move in this encoding is therefore akin to choosing a successor state at random. For the max-cut problem, the result is qualitatively different.
Step lengths are broadly distributed with most moves spanning a short distance on the original landscape. Based on this it is tempting to conclude that optimization proceeds in 'smaller steps' on the maxcut landscape, than in the NPP problem.

Evolutionary dynamics
One might ask if the encoded landscape also facilitates the search dynamics, by virtue of its modified structure, and offers another avenue for optimization. For this purpose, we consider an optimization dynamics as a zero-temperature Markov chain . This is an Adaptive Walk (AW) when the proposal x 0 is drawn from the neighborhood of x(t). In Randomly Generate and Test (RGT), proposals are drawn from the whole set of configurations independently of the neighborhood structure. Thus a performance comparison between AW and RGT elucidates if the move set is suitably chosen for optimization. Because of the enrichment of low energy states by the encodings, it is clear that RGT performs strictly better on the encoding than on the original landscape.
Adaptive walks also perform strictly better on the encoding than on the original landscape, at least in the long-time limit, cf. Figure 3. Beyond this general benefit of the encodings, the dynamics shows marked differences across the three optimization problems. In the NPP problem, RGT outperforms AW on the encoded landscape, so that enrichment alone is responsible for the increase in optimization with respect to the original landscape. In the encodings of the other two problems, AW performs better than RGT so that we can conclude that the improved structure of the encoded landscape is also an important reason for the observed increase in performance, in addition to simple enrichment. The dynamics on the max-cut landscapes (panel c) has the same qualitative behavior as that on the TSP (panel a). Although there is a transient for intermediate times where adaptive walks on the original landscape seem to be winning, the asymptotic behavior is clear: adaptive walks on the encoded landscape perform best.

Conclusion
We have examined the role of encodings in arriving at optimal solutions to NP-complete problems: we have constructed encodings for three examples, viz. the NPP, Spin-Glass and TSP problems, and demonstrated that the choice of a good encoding can indeed help optimization. In the examples we have chosen, the benefits arise primarily as a result of the enrichment of low-energy solutions. A secondary effect in some but not all encodings considered here is the introduction of a high degree of neutrality. The latter enables a diffusion-like mode of search that can be much more efficient than the combination of fast hillclimbing and exponentially rare jumps from local optima. The two criteria, (1) selective enrichment of low energy states and, where possible, (2) increase of local degeneracy, can guide the construction of alternative encodings explicitly making use of a priori knowledge on the mathematical structure of optimization problem. The qualitative understanding of the effect of encodings on landscape structures in particular resolves apparently conflicting ''design guidelines'' for the construction of evolutionary algorithms.  The beneficial effects of enriching encodings immediately pose the question whether there is a generic way in which they can be constructed. The constructions for the NPP and TSP encodings suggest one rather general design principle. Suppose there is a natural way of decomposing a solution x of the original problem into partial solutions. We can think of a partial solution j as the set of all solutions that have a particular property. In the TSP example, j refers to a set of solutions in which a certain list A of cities appears as an uninterrupted interval. Now we choose the encoding y so that it has an interpretation as a collection J(y) of partial solutions. A deterministic optimization heuristic can now be used to determine a good solution x Ã (J(y)). In the case of the TSP, J(y) corresponds to a set of constrained tours from which we choose by a greedy solution. Alternatively, J(y) may over-specify a solution, in which case the optimization procedure would attempt to extract an optimal subset of J'(J(y) so that T j [ J' j contains a valid solution x Ã . In either case, a : y.x Ã is an encoding that is likely to favour low-energy states. It is not obvious, however, that the spanning-tree encoding for max-cut can also be understood as a combination of partial solutions. It remains an important question for future research to derive necessary and sufficient conditions under which optimized combinations of partial solutions indeed guarantee that the encoding is enriching.

Landscapes and encoding
A finite discrete energy landscape (X ,L,f ) consists of a finite set of configurations X endowed with an adjacency structure L and with a function f : X ?R called energy, and hence {f fitness. The global minima of f are called ground states. L is a set of unordered tuples in X , thus (X ,L) is a simple undirected graph. Let (Y ,M) be another simple graph and consider a mapping a : Y ?X |f1g, which we call an encoding of X . Then (Y ,M,f 0a) is again a landscape. (If we include states in Y that do not encode feasible solutions we assign them infinite energy, i.e., f 0a(y)~z [ fty if a(y)~1.) The encoding is complete if a is surjective, i.e., if every x [ X is encoded by at least one vertex of y [ Y . Both landscapes then describe the same optimization problem. In the language of evolutionary computation, (Y ,M) is the genotype space, while (X ,L) is the phenotype space corresponding to the ''direct encoding'' of the problem. With this notation fixed, our problem reduces to understanding the differences between the genotypic landscape (Y ,M,f 0a) and the phenotypic landscape (X ,L,f ) w.r.t. optimization dynamics.

Test Instances
Random instances fox max-cut (spin glass) are generated as standard random graphs [31] with parameter p~0:5: each potential edge is present or absent with equal probability, independent from other edges. Distances d ij~dji for the symmetric TSP and numbers a i for NPP are drawn independently from the uniform distribution on the interval ½0,1.

Enrichment factor and Density of States
The enrichment factor r(h) can be obtained directly from the cumulative densities of states of the two landscapes: This expression is a well-defined function for arguments h [ ½0,1 because Q f 0a only changes value where Q f also does. For ground state energy g 0 , the enrichment of the ground state is Q f 0a (g 0 )=Q f (g 0 ).
The results in Figure 1(a-c) are obtained by sampling 2|10 7 uniformly drawn states each from the original states X and the prepartitionings Y for the traveling salesman. For the two other problems, the density of states of the original landscapes is exact by complete enumeration. For the spin glass also, the density of states for Y is exact from calculation based on the matrix-tree theorem. For number partitioning, 2 n samples in Y are drawn at random.
The enrichment of the ground state, Figure 1(d), is an average over 100 realizations for each problem type and size n. For each realization of number partitioning and max-cut, 2 n uniform samples in Y are taken; the ground state energy itself is obtained by complete enumeration of X . For each realization of the traveling salesman problem, 10 9 uniform samples are taken in Y ; the ground state energy is computed with the Karp-Held algorithm [32].  . Performance comparison between three types of stochastic dynamics: adaptive walks (AW) on the original (%) and encoded (0) landscapes and randomly generate and test (RGT) on the encoded landscape (%). The plotted performance value is the fraction of instances for which the considered evolutionary dynamics is ''leading'' at time t, i.e. has an energy not larger than the other two types of dynamics. For each landscape, 100 random instances are used with sizes n~30 in panels (a) and (b), n~200 in panel (c). On each of the instances, each type of evolutionary dynamics is run once with randomly drawn initial condition y(0) [ Y for RGT and AW in the encoded landscape. The AW on the original landscape is initialized with the mapped state x(0)~a(y(0)). Thus all three dynamics are started at the same energy. doi:10.1371/journal.pone.0034780.g003